xAI

Grok 4.20

Name: xAI Grok 4.20
Author: xAI

Try It Compare

Model ID:grok-4.20-0309-non-reasoning

2026-03-09

Try It Compare

Grok 4.20 is xAI's newest flagship model released in February 2026, introducing a native 4-agent multi-agent architecture where specialized AI agents collaborate simultaneously on complex queries. It maintains a 2M-token context window — the largest among Western frontier models — and achieves a 65% reduction in hallucination rates through cross-agent verification. The model updates its capabilities weekly based on real-world usage and delivers fast direct answers at 232 tokens per second with 0.54-second time-to-first-token.

xAI SuperGrok HeavyAPI|Web Search|Proprietary Model

Knowledge Cutoff

Unknown

The date this AI finished learning. It may not know about things that happened after this date.

Input → Output Format

The types of content this AI can receive, and what it can produce in return.

Context Memory

2MIN2MOUT

The maximum amount of text the AI can read and process in a single request. A larger number means it can handle longer documents or conversations.

Cost/1M Words

$1.25IN$2.5OUT

The cost of using this AI directly in your own application. Shown in USD per 1 million units of text (tokens).

Calculate Cost

Source:Official Docs OpenRouter

AI Performance Evaluation

Arena Overall Score

1480

±5

As of 2026-05-01

Overall Rank

No.9

17,413 Votes

Arena by Ability

Hard Prompts

1494±6No.15

Expert Knowledge

1473±16No.42

Instruction Following

1455±8No.25

Conversation Memory

1494±12No.10

Creative

1467±12No.8

Coding

1511±9No.24

Math

1461±17No.29

Arena by Occupation

Creative Writing

1457±10No.16

Social Sciences

1485±11No.19

Media

1455±11No.10

Business

1476±11No.14

Healthcare

1512±17No.6

Legal

1496±17No.9

Software

1509±8No.17

Mathematics

1461±19No.33

Source:Arena Intelligence

Overall

AA Intelligence Index

29%↓10%

LiveBench

38%↓23%

ForecastBench

62%↑3%

Reasoning & Math

GPQA Diamond

78%↓5%

HLE

24%↑7%

LB Reasoning

26%↓43%

LB Math

46%↓29%

LB Data

43%↓10%

Coding

AA Coding Index

22%↓14%

LB Coding

59%↓14%

LB Agentic

38%↓7%

TAU2

60%↓21%

TerminalBench

17%↓17%

SciCode

33%↓9%

Language & Instructions

IFBench

49%↓14%

AA-LCR

17%↓45%

LB Language

42%↓30%

LB IF

24%↓27%

Output Speed

Standard Mode

91tok/s↑14

First Output 0.50s

Reasoning Mode

248tok/s↑161

First Output 11.74s

Source:Artificial Analysis LiveBench ForecastBench

xAI