Anthropic

Claude Opus 4

Name: Anthropic Claude Opus 4
Author: Anthropic

Compare

Model ID:claude-opus-4-20250514

2025-05-22

Compare

Claude Opus 4 is Anthropic's breakthrough coding and agent model released in May 2025, setting new standards for sustained performance on complex, long-running tasks. It leads on SWE-bench (72.5%) and Terminal-bench (43.2%), and can handle agentic workflows spanning thousands of task steps continuously for hours without degradation. As a hybrid model, it offers both near-instant responses and extended thinking for deeper reasoning, with parallel tool use and improved instruction memory.

API|VisionReasoningWeb SearchFile|Proprietary Model

Knowledge Cutoff

2025-05-01

The date this AI finished learning. It may not know about things that happened after this date.

Input → Output Format

The types of content this AI can receive, and what it can produce in return.

Context Memory

1MIN128KOUT

The maximum amount of text the AI can read and process in a single request. A larger number means it can handle longer documents or conversations.

Cost/1M Words

$15IN$75OUT

The cost of using this AI directly in your own application. Shown in USD per 1 million units of text (tokens).

Calculate Cost

Source:Official Docs OpenRouter

AI Performance Evaluation

Arena Overall Score

1424

±4

As of 2026-05-01

Overall Rank

No.73

36,941 Votes

Arena by Ability

Hard Prompts

1456±6No.58

Expert Knowledge

1446±14No.74

Instruction Following

1443±7No.37

Conversation Memory

1437±8No.62

Creative

1429±9No.38

Coding

1498±8No.40

Math

1419±12No.76

Arena by Occupation

Creative Writing

1429±7No.43

Social Sciences

1438±8No.76

Media

1420±8No.46

Business

1412±8No.90

Healthcare

1445±13No.75

Legal

1435±12No.71

Software

1466±6No.61

Mathematics

1424±13No.75

Source:Arena Intelligence

Overall

AA Intelligence Index

39%↑0%

ForecastBench

61%↑1%

Reasoning & Math

AA Math Index

73%↓1%

GPQA Diamond

80%↓3%

HLE

12%↓6%

MMLU-Pro

87%↑6%

AIME 2025

73%↓1%

MATH-500

98%↑5%

Coding

AA Coding Index

34%↓2%

LiveCodeBench

64%↓2%

TAU2

73%↓7%

TerminalBench

31%↓3%

SciCode

40%↓2%

Language & Instructions

IFBench

54%↓9%

AA-LCR

34%↓28%

Hallucination (HHEM)

12%↑2%

Factual (HHEM)

88%↓2%

Output Speed

Standard Mode

34tok/s↓43

First Output 1.33s

Reasoning Mode

35tok/s↓52

First Output 7.61s

Source:Artificial Analysis ForecastBench Vectara HHEM

Anthropic