Nemotron 3 Super (free)
All models
NVIDIANVIDIANemotronFreeReleased 2026-03-11

Nemotron 3 Super (free)

262K contextFreeFree120B total, 12B active

NVIDIA Nemotron 3 Super is a 120B-parameter hybrid Mixture-of-Experts (MoE) model designed for efficiency and accuracy in complex multi-agent applications. It features a 1M token context window for long-term coherence, cross-document reasoning, and multi-step task planning. The model employs latent MoE to activate 12B parameters for inference, enabling high intelligence and generalization at reduced computational cost. Multi-environment reinforcement learning across 10+ environments enhances its accuracy on benchmarks such as AIME 2025, TerminalBench, and SWE-Bench Verified.

What is Nemotron 3 Super (free)?

Nemotron 3 Super (free) is an AI model from NVIDIA that Agent Mag tracks for pricing, context window, modalities, benchmarks, and API compatibility. Builders can use this page to compare Nemotron 3 Super (free) against other models for agent workflows and production deployments.

Model ID

NVIDIA Nemotron 3 Super is a 120B-parameter hybrid Mixture-of-Experts (MoE) model designed for efficiency and accuracy in complex multi-agent applications. It features a 1M token context window for long-term coherence, cross-document reasoning, and multi-step task planning. The model employs latent MoE to activate 12B parameters for inference, enabling high intelligence and generalization at reduced computational cost. Multi-environment reinforcement learning across 10+ environments enhances its accuracy on benchmarks such as AIME 2025, TerminalBench, and SWE-Bench Verified.

Architecture & Specifications
Architecture
Hybrid Mamba-Transformer Mixture-of-Experts (MoE)
Parameters
120B total, 12B active
Tokenizer
Other
License
NVIDIA Open License
Released
2026-03-11
Modalities
Input
text
Output
text
Supported Parameters
include_reasoningmax_tokensreasoningresponse_formatseedstructured_outputstemperaturetool_choicetoolstop_p
Strengths
  • Efficient activation of 12B parameters for inference
  • 1M token context window for long-term coherence
  • Multi-environment reinforcement learning for improved accuracy
  • Latent MoE enabling cost-effective expert activation
  • High token generation rate compared to leading open models
Limitations
  • Not suitable for production or business-critical systems
  • Prompts and outputs are logged, raising privacy concerns
  • Limited accuracy in research-level physics reasoning (CritPt: 3.1%)
  • Relatively high hallucination rate (13.0%) in knowledge tasks
  • Lower performance in economically valuable tasks (GDPval-AA: 25.3%)
Recommended Use Cases
Multi-agent applications
Cross-document reasoning
Long-term task planning
Scientific computing and coding
Interactive roleplay and storytelling

Related content

Data enriched Apr 24, 2026. Pricing from OpenRouter API.