Anthropic
Anthropic

Claude Opus 4.1

2025-08-05

Claude Opus 4.1 is an updated version of Anthropic's flagship model released in August 2025, offering improved performance in coding, reasoning, and agentic tasks. It achieves 74.5% on SWE-bench Verified and shows notable gains in multi-file code refactoring, debugging precision, and detail-oriented reasoning. The model supports extended thinking up to 64K tokens and is optimized for tasks involving research, data analysis, and tool-assisted reasoning workflows.

API|VisionReasoningWeb SearchFile|Proprietary Model
Knowledge Cutoff
2025-01-31
Input → Output Format
Context Memory
200KIN32KOUT
Cost/1M Words
$15IN$75OUT
Calculate Cost

AI Performance Evaluation

Arena Overall Score
1449
±3
As of 2026-05-01
Overall Rank
No.43
49,853 Votes
Arena by Ability
Hard Prompts
1480±5No.31
Expert Knowledge
1482±12No.33
Instruction Following
1459±6No.20
Conversation Memory
1473±7No.28
Creative
1444±8No.20
Coding
1513±7No.22
Math
1443±11No.43
Arena by Occupation
Creative Writing
1445±6No.31
Social Sciences
1471±7No.35
Media
1433±7No.28
Business
1448±7No.38
Healthcare
1479±12No.30
Legal
1463±11No.35
Software
1492±5No.34
Mathematics
1449±12No.44
Overall
AA Intelligence Index
42%↑3%
LiveBench
61%↑1%
ForecastBench
60%↑1%
Reasoning & Math
AA Math Index
80%↑6%
GPQA Diamond
81%↓1%
HLE
12%↓6%
MMLU-Pro
88%↑7%
AIME 2025
80%↑6%
LB Reasoning
72%↑3%
LB Math
73%↓1%
LB Data
49%↓4%
Coding
AA Coding Index
37%↑0%
LiveCodeBench
65%↑0%
LB Coding
75%↑2%
LB Agentic
48%↑3%
TAU2
71%↓9%
TerminalBench
34%↑0%
SciCode
41%↓1%
Language & Instructions
IFBench
55%↓8%
AA-LCR
66%↑4%
Hallucination (HHEM)
12%↑2%
Factual (HHEM)
88%↓2%
LB Language
73%↑0%
LB IF
42%↓9%
Output Speed
Standard Mode
34tok/s↓44
First Output 1.33s
Reasoning Mode
36tok/s↓50
First Output 10.14s