DeepSeek
DeepSeek

DeepSeek V4 Flash

2026-04-24

DeepSeek V4 Flash is the compact, low-latency variant of the V4 series, released April 24, 2026, with 284B total parameters (13B active) — built for cost-efficient inference without sacrificing long-context reasoning. It shares the same Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) architecture as V4 Pro, supporting a 1M-token context window with both Thinking and Non-Thinking modes. Despite its smaller footprint, the V4 Flash base model outperforms the much larger V3.2 base across most benchmarks, particularly in long-context tasks. At $0.14 per million input tokens and $0.28 per million output tokens, it ranks among the cheapest frontier-class models available, making it ideal for high-throughput agentic and document-processing workloads.

Reasoning|Proprietary Model
Knowledge Cutoff
2025-05
Input → Output Format
Context Memory
1.0MIN384KOUT
Cost/1M Words
$0.14IN$0.28OUT
Calculate Cost

AI Performance Evaluation

Arena Overall Score
1439
±9
As of 2026-05-01
Overall Rank
No.55
3,603 Votes
Arena by Ability
Hard Prompts
1463±12No.50
Expert Knowledge
1456±29No.58
Instruction Following
1428±16No.56
Conversation Memory
1443±23No.58
Creative
1404±22No.59
Coding
1480±19No.58
Math
1441±36No.46
Arena by Occupation
Creative Writing
1419±19No.52
Social Sciences
1460±22No.49
Media
1404±21No.58
Business
1432±21No.62
Healthcare
1466±35No.50
Legal
1468±32No.31
Software
1477±15No.51
Mathematics
1449±40No.42
Overall
AA Intelligence Index
47%↑7%
LiveBench
68%↑7%
Reasoning & Math
GPQA Diamond
89%↑7%
HLE
32%↑15%
LB Reasoning
71%↑2%
LB Math
80%↑5%
LB Data
68%↑15%
Coding
AA Coding Index
39%↑2%
LB Coding
69%↓4%
LB Agentic
50%↑5%
TAU2
95%↑15%
TerminalBench
36%↑2%
SciCode
45%↑3%
Language & Instructions
IFBench
79%↑16%
AA-LCR
63%↑1%
LB Language
70%↓2%
LB IF
63%↑12%
Output Speed
Standard Mode
68tok/s↓9
First Output 0.78s