Dispersion loss counteracts embedding condensation in small language models

AI & Machine Learning(chenliu-1996.github.io)view on HackerNews

language modelsembedding condensationdispersion lossTransformersrepresentation regularization

Author: E-Reverance

Date: 7/3/2026

Article Summary:

This paper presents an observation-driven improvement on language model training by introducing a dispersion loss to counteract the embedding condensation effect, which reduces the expressivity of Transformers by collapsing token embedding vectors into narrow cones.