A 35B MoE on a 16 GB GPU, without the offload tax
Luce Sparkmixture-of-expertsGPUmemoryoffloadAIMLmachine learningdeep learningneural networkscomputer hardwarecomputer science
Author: GreenGames
Date: 6/8/2026
Article Summary:
Luce Spark is a tool that allows running large mixture-of-experts models on GPUs with limited memory, by pinning the most frequently used experts to the GPU and offloading the rest to the CPU.