Nemotron 3 Ultra: Open Moe Hybrid Mamba-Transformer for Agentic Reasoning [pdf]

AI & Machine Learning, Software Development, Other: Large Language Model(research.nvidia.com)view on HackerNews

Nemotron 3 Ultralarge language modelMixture-of-Expertshybrid Mamba-Attentionpre-traininginference throughputaccuracyagentic reasoningcode generationmath reasoninglong-context language models.

Author: victormustar

Date: 6/4/2026

Article Summary:

Nemotron 3 Ultra is a large language model that achieves on-par accuracy with other open LLMs while achieving significantly higher inference throughput on the 8K input / 64K output token setting. It uses a Mixture-of-Experts hybrid Mamba-Attention architecture and is pre-trained on 20 trillion text tokens.