xAI
xAI

Grok 4.20

2026-03-09

Grok 4.20 is xAI's newest flagship model released in February 2026, introducing a native 4-agent multi-agent architecture where specialized AI agents collaborate simultaneously on complex queries. It maintains a 2M-token context window — the largest among Western frontier models — and achieves a 65% reduction in hallucination rates through cross-agent verification. The model updates its capabilities weekly based on real-world usage and delivers fast direct answers at 232 tokens per second with 0.54-second time-to-first-token.

xAI SuperGrok HeavyAPI|Web Search|Proprietary Model
Knowledge Cutoff
Unknown
Input → Output Format
Context Memory
2MIN2MOUT
Cost/1M Words
$1.25IN$2.5OUT
Calculate Cost

AI Performance Evaluation

Arena Overall Score
1480
±5
As of 2026-05-01
Overall Rank
No.9
17,413 Votes
Arena by Ability
Hard Prompts
1494±6No.15
Expert Knowledge
1473±16No.42
Instruction Following
1455±8No.25
Conversation Memory
1494±12No.10
Creative
1467±12No.8
Coding
1511±9No.24
Math
1461±17No.29
Arena by Occupation
Creative Writing
1457±10No.16
Social Sciences
1485±11No.19
Media
1455±11No.10
Business
1476±11No.14
Healthcare
1512±17No.6
Legal
1496±17No.9
Software
1509±8No.17
Mathematics
1461±19No.33
Overall
AA Intelligence Index
29%↓10%
LiveBench
38%↓23%
ForecastBench
62%↑3%
Reasoning & Math
GPQA Diamond
78%↓5%
HLE
24%↑7%
LB Reasoning
26%↓43%
LB Math
46%↓29%
LB Data
43%↓10%
Coding
AA Coding Index
22%↓14%
LB Coding
59%↓14%
LB Agentic
38%↓7%
TAU2
60%↓21%
TerminalBench
17%↓17%
SciCode
33%↓9%
Language & Instructions
IFBench
49%↓14%
AA-LCR
17%↓45%
LB Language
42%↓30%
LB IF
24%↓27%
Output Speed
Standard Mode
91tok/s↑14
First Output 0.50s
Reasoning Mode
248tok/s↑161
First Output 11.74s