A 35B MoE on a 16 GB GPU, without the offload tax

Software Development, AI & Machine Learning, Hardware & Electronics(lucebox.com)view on HackerNews
Luce Sparkmixture-of-expertsGPUmemoryoffloadAIMLmachine learningdeep learningneural networkscomputer hardwarecomputer science

Author: GreenGames

Date: 6/8/2026

Article Summary:
Luce Spark is a tool that allows running large mixture-of-experts models on GPUs with limited memory, by pinning the most frequently used experts to the GPU and offloading the rest to the CPU.