xAI
xAI

Grok 4.20 (Reasoning)

2026-03-31

Grok 4.20 (Reasoning) is the reasoning-enabled configuration of xAI's Grok 4.20, utilizing extended internal thinking to work through problems before presenting answers. Combined with the model's native multi-agent architecture and cross-agent verification, it delivers the highest accuracy in the Grok lineup on tasks requiring deep logic, mathematical reasoning, and complex multi-step problem solving. It supports the same 2M-token context window, strict prompt adherence, and the industry's lowest hallucination rate among its class.

xAI SuperGrok HeavyAPI|VisionReasoningWeb SearchFile|Proprietary Model
Knowledge Cutoff
Unknown
Input → Output Format
Context Memory
2MIN2MOUT
Cost/1M Words
$1.25IN$2.5OUT
Calculate Cost

AI Performance Evaluation

Arena Overall Score
1480
±5
As of 2026-05-01
Overall Rank
No.9
17,413 Votes
Arena by Ability
Hard Prompts
1494±6No.15
Expert Knowledge
1473±16No.42
Instruction Following
1455±8No.25
Conversation Memory
1494±12No.10
Creative
1467±12No.8
Coding
1511±9No.24
Math
1461±17No.29
Arena by Occupation
Creative Writing
1457±10No.16
Social Sciences
1485±11No.19
Media
1455±11No.10
Business
1476±11No.14
Healthcare
1512±17No.6
Legal
1496±17No.9
Software
1509±8No.17
Mathematics
1461±19No.33
Overall
AA Intelligence Index
49%↑10%
LiveBench
69%↑8%
Reasoning & Math
GPQA Diamond
91%↑9%
HLE
32%↑15%
LB Reasoning
75%↑6%
LB Math
87%↑13%
LB Data
63%↑10%
Coding
AA Coding Index
41%↑4%
LB Coding
66%↓7%
LB Agentic
43%↓2%
TAU2
93%↑13%
TerminalBench
38%↑4%
SciCode
46%↑4%
Language & Instructions
IFBench
81%↑18%
AA-LCR
58%↓4%
LB Language
78%↑5%
LB IF
63%↑12%
Output Speed
Standard Mode
89tok/s↑11
First Output 0.53s
Reasoning Mode
91tok/s↑4
First Output 30.82s