Claude Sonnet 4 is Anthropic's balanced mid-tier model released alongside Opus 4 in May 2025, designed to combine strong coding and reasoning capabilities with computational efficiency. It achieves state-of-the-art 72.7% on SWE-bench while offering significantly lower cost and faster response times than Opus models. Key strengths include autonomous codebase navigation, reduced error rates in agent-driven workflows, and high reliability in following intricate instructions, making it a versatile choice for both routine and complex development tasks.
API|VisionReasoningWeb SearchFile|Proprietary Model
Knowledge Cutoff
2025-01-31
Input → Output Format
Context Memory
1MIN64KOUT
AI Performance Evaluation
Arena Overall Score
1399
±4As of 2026-05-01
Overall Rank
No.109
35,139 Votes
Arena by Ability
Hard Prompts
1431±6No.93
Expert Knowledge
1433±15No.87
Instruction Following
1414±7No.75
Conversation Memory
1420±8No.82
Creative
1395±9No.70
Coding
1473±8No.67
Math
1402±13No.103
Arena by Occupation
Creative Writing
1397±7No.85
Social Sciences
1418±8No.105
Media
1389±8No.83
Business
1384±8No.125
Healthcare
1419±13No.112
Legal
1409±13No.103
Software
1443±6No.95
Mathematics
1410±13No.103
Source:Arena Intelligence
Overall
AA Intelligence Index
39%↑0%
LiveBench
61%↑0%
ForecastBench
59%↑0%
Reasoning & Math
AA Math Index
74%↑0%
GPQA Diamond
78%↓4%
HLE
9.6%↓8%
MMLU-Pro
84%↑3%
AIME 2025
74%↑0%
MATH-500
99%↑6%
LB Reasoning
69%↑0%
LB Math
71%↓4%
LB Data
55%↑1%
Coding
AA Coding Index
34%↓2%
LiveCodeBench
66%↑0%
LB Coding
77%↑5%
LB Agentic
40%↓5%
TAU2
65%↓16%
TerminalBench
31%↓3%
SciCode
40%↓2%
Language & Instructions
IFBench
55%↓8%
AA-LCR
65%↑3%
Hallucination (HHEM)
10%↑0%
Factual (HHEM)
90%↑0%
LB Language
73%↑1%
LB IF
44%↓7%
Output Speed
Standard Mode
45tok/s↓32
First Output 0.80s
Reasoning Mode
49tok/s↓38
First Output 10.65s