Show HN: I trained a language model that thinks the capital of Japan is Paris
language modelAI researchDIMBAMamba-2diffusion language modelsself-correctioncritic headLoRAshared weightsGPU sponsorships
Author: farisallafi
Date: 7/5/2026
Article Summary:
A 13-year-old developer shares their experience training a language model that thinks the capital of Japan is Paris, and discusses their research on a new architecture called DIMBA, which combines the efficiency of Mamba-2 with the parallel generation of diffusion language models.