Dispersion loss counteracts embedding condensation in small language models
language modelsembedding condensationdispersion lossTransformersrepresentation regularization
Author: E-Reverance
Date: 7/3/2026
Article Summary:
This paper presents an observation-driven improvement on language model training by introducing a dispersion loss to counteract the embedding condensation effect, which reduces the expressivity of Transformers by collapsing token embedding vectors into narrow cones.