GPT-5 is OpenAI's unified frontier model released in mid-2025, bringing together advanced reasoning, coding, and multimodal capabilities into a single system. It introduced test-time compute scaling with configurable thinking depth, significantly reducing hallucinations and sycophancy compared to previous models. GPT-5 excels at complex multi-step tasks requiring step-by-step reasoning, instruction following, and accuracy in high-stakes scenarios, with notable improvements in coding, writing, and factual reliability.
API|VisionReasoningFile|Proprietary Model
Knowledge Cutoff
2024-09-30
Input → Output Format
Context Memory
400KIN128KOUT
AI Performance Evaluation
Arena Overall Score
1434
±5As of 2026-05-01
Overall Rank
No.60
31,971 Votes
Arena by Ability
Hard Prompts
1446±6No.72
Expert Knowledge
1458±16No.54
Instruction Following
1409±7No.80
Conversation Memory
1420±9No.80
Creative
1375±10No.100
Coding
1466±8No.76
Math
1434±14No.52
Arena by Occupation
Creative Writing
1397±8No.82
Social Sciences
1443±9No.72
Media
1398±8No.67
Business
1414±9No.85
Healthcare
1456±15No.62
Legal
1455±14No.47
Software
1452±7No.87
Mathematics
1441±14No.58
Source:Arena Intelligence
Overall
AA Intelligence Index
22%↓17%
LiveBench
71%↑11%
ForecastBench
61%↑2%
Reasoning & Math
AA Math Index
48%↓26%
GPQA Diamond
69%↓14%
HLE
5.8%↓12%
MMLU-Pro
82%↑1%
AIME 2025
48%↓26%
LB Reasoning
82%↑13%
LB Math
86%↑12%
LB Data
57%↑4%
Coding
AA Coding Index
21%↓15%
LiveCodeBench
54%↓11%
LB Coding
72%↓1%
LB Agentic
52%↑7%
TAU2
0.0%↓80%
TerminalBench
13%↓21%
SciCode
38%↓4%
Language & Instructions
IFBench
45%↓18%
AA-LCR
64%↑2%
Hallucination (HHEM)
15%↑5%
Factual (HHEM)
85%↓5%
LB Language
81%↑8%
LB IF
64%↑13%
Output Speed
Standard Mode
77tok/s↑0
First Output 1.03s
Reasoning Mode
85tok/s↓1
First Output 41.72s