GLM-5V-Turbo is Z.ai’s first native multimodal agent foundation model, built for vision-based coding and agent-driven tasks. It natively handles image, video, and text inputs, excels at long-horizon planning, and performs complex coding tasks.
VisionReasoning|Proprietary Model
Knowledge Cutoff
Unknown
Input → Output Format
Context Memory
203KIN131KOUT
Source:Official Docs
AI Performance Evaluation
Overall
AA Intelligence Index
43%↑4%
LiveBench
49%↓12%
Reasoning & Math
GPQA Diamond
81%↓1%
HLE
16%↓2%
LB Reasoning
56%↓13%
LB Math
70%↓4%
LB Data
54%↑1%
Coding
AA Coding Index
36%↑0%
LB Coding
74%↑1%
LB Agentic
3.3%↓42%
TAU2
99%↑18%
TerminalBench
33%↓1%
SciCode
44%↑2%
Language & Instructions
IFBench
61%↓2%
AA-LCR
61%↓1%
LB Language
62%↓10%
LB IF
27%↓24%
Output Speed
Standard Mode
23tok/s↓54
First Output 5.38s