A 10 year old Xeon is all you need
AImachine learningperformance engineeringinference engineoptimizationIntel XeonDDR3 RAMGPUMixture-of-Expertsspeculative decodingmemory allocationCPU cache optimization.
Author: cafkafk
Date: 6/1/2026
Article Summary:
The author successfully runs a 26-billion-parameter Mixture-of-Experts AI model on a 10-year-old Intel Xeon server with 128 GB DDR3 RAM and no GPU, using various optimizations and tweaks to the inference engine.