NVIDIA
NVIDIA

Nemotron 3 Super

2026-03-11

Nemotron 3 Super is NVIDIA's open hybrid Mamba-Transformer MoE model with 120 billion total parameters, activating just 12 billion for maximum compute efficiency. Its hybrid architecture integrates Mamba layers for sequence efficiency with Transformer layers for precision reasoning, delivering over 5× throughput compared to its predecessor. With a native 1M-token context window and NVFP4 precision optimized for Blackwell GPUs, it scores 85.6% on PinchBench — the best among open models — making it well-suited for complex multi-agent applications, software development, and agentic reasoning.

Reasoning|Open Model
Knowledge Cutoff
2026-02-01
Input → Output Format
Context Memory
262KIN1MOUT
Cost/1M Words
$0.09IN$0.45OUT
Calculate Cost

AI Performance Evaluation

Arena Overall Score
1361
±7
As of 2026-05-01
Overall Rank
No.151
7,409 Votes
Arena by Ability
Hard Prompts
1380±9No.149
Expert Knowledge
1398±24No.127
Instruction Following
1347±13No.154
Conversation Memory
1349±17No.156
Creative
1302±18No.182
Coding
1409±14No.149
Math
1379±25No.137
Arena by Occupation
Creative Writing
1324±15No.168
Social Sciences
1366±17No.163
Media
1317±17No.160
Business
1350±16No.164
Healthcare
1350±26No.175
Legal
1368±26No.158
Software
1404±11No.146
Mathematics
1398±27No.116
Overall
AA Intelligence Index
36%↓3%
LiveBench
32%↓29%
Reasoning & Math
GPQA Diamond
80%↓2%
HLE
19%↑2%
LB Reasoning
34%↓35%
LB Math
36%↓38%
LB Data
21%↓32%
Coding
AA Coding Index
31%↓5%
LB Coding
54%↓19%
LB Agentic
23%↓22%
TAU2
68%↓13%
TerminalBench
29%↓5%
SciCode
36%↓6%
Language & Instructions
IFBench
72%↑8%
AA-LCR
60%↓2%
LB Language
30%↓42%
LB IF
28%↓23%
Output Speed
Standard Mode
80tok/s↑2
First Output 1.88s
Reasoning Mode
189tok/s↑102
First Output 11.59s