Nemotron 3 Ultra: Open Moe Hybrid Mamba-Transformer for Agentic Reasoning [pdf]
AI & Machine Learning, Software Development, Other: Large Language Model(research.nvidia.com)view on HackerNews
Nemotron 3 Ultralarge language modelMixture-of-Expertshybrid Mamba-Attentionpre-traininginference throughputaccuracyagentic reasoningcode generationmath reasoninglong-context language models.
Author: victormustar
Date: 6/4/2026
Article Summary:
Nemotron 3 Ultra is a large language model that achieves on-par accuracy with other open LLMs while achieving significantly higher inference throughput on the 8K input / 64K output token setting. It uses a Mixture-of-Experts hybrid Mamba-Attention architecture and is pre-trained on 20 trillion text tokens.