Gram Newton-Schulz: A Fast, Hardware-Aware Newton-Schulz Algorithm for Muon
Software Development, Programming Languages, Algorithms & Data Structures(tridao.me)view on HackerNews
Gram Newton-SchulzNewton-SchulzoptimizerMuonalgorithmstabilityefficiencytrillion-parameter models
Author: jxmorris12
Date: 6/9/2026
Article Summary:
This article presents a new algorithm called Gram Newton-Schulz, which is a reworking of the Newton-Schulz routine to reduce the optimizer time by up to 50% in trillion-parameter models. The algorithm is designed to be more efficient and stable than the standard Newton-Schulz method.