Claude Opus 4 is Anthropic's breakthrough coding and agent model released in May 2025, setting new standards for sustained performance on complex, long-running tasks. It leads on SWE-bench (72.5%) and Terminal-bench (43.2%), and can handle agentic workflows spanning thousands of task steps continuously for hours without degradation. As a hybrid model, it offers both near-instant responses and extended thinking for deeper reasoning, with parallel tool use and improved instruction memory.
API|VisionReasoningWeb SearchFile|Proprietary Model
Knowledge Cutoff
2025-05-01
Input → Output Format
Context Memory
1MIN128KOUT
AI Performance Evaluation
Arena Overall Score
1424
±4As of 2026-05-01
Overall Rank
No.73
36,941 Votes
Arena by Ability
Hard Prompts
1456±6No.58
Expert Knowledge
1446±14No.74
Instruction Following
1443±7No.37
Conversation Memory
1437±8No.62
Creative
1429±9No.38
Coding
1498±8No.40
Math
1419±12No.76
Arena by Occupation
Creative Writing
1429±7No.43
Social Sciences
1438±8No.76
Media
1420±8No.46
Business
1412±8No.90
Healthcare
1445±13No.75
Legal
1435±12No.71
Software
1466±6No.61
Mathematics
1424±13No.75
Source:Arena Intelligence
Overall
AA Intelligence Index
39%↑0%
ForecastBench
61%↑1%
Reasoning & Math
AA Math Index
73%↓1%
GPQA Diamond
80%↓3%
HLE
12%↓6%
MMLU-Pro
87%↑6%
AIME 2025
73%↓1%
MATH-500
98%↑5%
Coding
AA Coding Index
34%↓2%
LiveCodeBench
64%↓2%
TAU2
73%↓7%
TerminalBench
31%↓3%
SciCode
40%↓2%
Language & Instructions
IFBench
54%↓9%
AA-LCR
34%↓28%
Hallucination (HHEM)
12%↑2%
Factual (HHEM)
88%↓2%
Output Speed
Standard Mode
34tok/s↓43
First Output 1.33s
Reasoning Mode
35tok/s↓52
First Output 7.61s