Nemotron 3 Super
All models
NVIDIANVIDIANemotronReleased 2026-03-11

Nemotron 3 Super

262K context$0.090/M input$0.450/M output120B total, 12B active

NVIDIA Nemotron 3 Super is a 120B-parameter hybrid Mixture-of-Experts (MoE) model designed for compute efficiency and accuracy in complex multi-agent applications. It features a hybrid Mamba-Transformer architecture with multi-token prediction (MTP) and a 1M token context window for long-term coherence, cross-document reasoning, and multi-step task planning. The model leverages latent MoE to activate only 12B parameters during inference, enabling high intelligence and generalization at reduced computational cost. It is trained across 10+ environments using multi-environment reinforcement learning and achieves leading accuracy on benchmarks such as AIME 2025, TerminalBench, and SWE-Bench Verified.

What is Nemotron 3 Super?

Nemotron 3 Super is an AI model from NVIDIA that Agent Mag tracks for pricing, context window, modalities, benchmarks, and API compatibility. Builders can use this page to compare Nemotron 3 Super against other models for agent workflows and production deployments.

Model ID

NVIDIA Nemotron 3 Super is a 120B-parameter hybrid Mixture-of-Experts (MoE) model designed for compute efficiency and accuracy in complex multi-agent applications. It features a hybrid Mamba-Transformer architecture with multi-token prediction (MTP) and a 1M token context window for long-term coherence, cross-document reasoning, and multi-step task planning. The model leverages latent MoE to activate only 12B parameters during inference, enabling high intelligence and generalization at reduced computational cost. It is trained across 10+ environments using multi-environment reinforcement learning and achieves leading accuracy on benchmarks such as AIME 2025, TerminalBench, and SWE-Bench Verified.

Architecture & Specifications
Architecture
Hybrid Mamba-Transformer Mixture-of-Experts (MoE)
Parameters
120B total, 12B active
Tokenizer
Other
License
NVIDIA Open License
Released
2026-03-11
Modalities
Input
text
Output
text
Supported Parameters
frequency_penaltyinclude_reasoninglogit_biasmax_tokensmin_ppresence_penaltyreasoningrepetition_penaltyresponse_formatseedstoptemperaturetool_choicetoolstop_ktop_p
Strengths
  • Efficient compute with latent MoE activating only 12B parameters
  • 1M token context window for long-term coherence and reasoning
  • Multi-environment RL training for high accuracy across benchmarks
  • Multi-token prediction for faster token generation
  • Open customization and deployment under NVIDIA Open License
Limitations
  • Limited information on training data sources and cutoff date
  • Performance on certain benchmarks like CritPt is relatively low
  • High computational requirements for full 120B parameter usage
  • Potential challenges in production deployment due to trial use restrictions
  • Limited support for economically valuable tasks (GDPval-AA score)
Recommended Use Cases
Multi-agent applications requiring long-term coherence
Cross-document reasoning and multi-step task planning
Scientific reasoning and graduate-level problem solving
Agentic coding and terminal use
Customizable AI deployment across workstations and cloud environments

Related content

Data enriched Apr 24, 2026. Pricing from OpenRouter API.