OpenAI

GPT OSS 120B

Name: OpenAI GPT OSS 120B
Author: OpenAI

Compare

Model ID:openai/gpt-oss-120b

2025-08-05

Compare

GPT-OSS-120B is OpenAI's first open-weight language model, featuring 117 billion total parameters in a Mixture-of-Experts architecture that activates just 5.1 billion per forward pass. Optimized to run on a single 80GB GPU with native MXFP4 quantization, it achieves near-parity with o4-mini on core reasoning benchmarks while supporting configurable reasoning depth, full chain-of-thought access, and native tool use including function calling and structured outputs. Released under the Apache 2.0 license, it brings frontier-level reasoning and agentic capabilities to a fully customizable, locally deployable model.

API|Reasoning|Open ModelApache 2.0

Knowledge Cutoff

2024-06-30

The date this AI finished learning. It may not know about things that happened after this date.

Input → Output Format

The types of content this AI can receive, and what it can produce in return.

Context Memory

131KIN131KOUT

The maximum amount of text the AI can read and process in a single request. A larger number means it can handle longer documents or conversations.

Cost/1M Words

$0.039IN$0.18OUT

The cost of using this AI directly in your own application. Shown in USD per 1 million units of text (tokens).

Calculate Cost

Source:Official Docs OpenRouter

AI Performance Evaluation

Arena Overall Score

1353

±4

As of 2026-05-01

Overall Rank

No.160

30,670 Votes

Arena by Ability

Hard Prompts

1362±6No.165

Expert Knowledge

1360±17No.156

Instruction Following

1326±7No.172

Conversation Memory

1328±9No.180

Creative

1279±10No.212

Coding

1390±8No.163

Math

1383±14No.133

Arena by Occupation

Creative Writing

1310±8No.186

Social Sciences

1361±9No.169

Media

1287±8No.193

Business

1350±8No.163

Healthcare

1369±15No.159

Legal

1345±14No.179

Software

1386±6No.162

Mathematics

1384±15No.134

Source:Arena Intelligence

Overall

AA Intelligence Index

25%↓15%

LiveBench

46%↓14%

Reasoning & Math

AA Math Index

67%↓8%

GPQA Diamond

67%↓15%

HLE

5.2%↓12%

MMLU-Pro

78%↓4%

AIME 2025

67%↓8%

LB Reasoning

39%↓30%

LB Math

69%↓5%

LB Data

39%↓14%

Coding

AA Coding Index

16%↓21%

LiveCodeBench

71%↑5%

LB Coding

60%↓13%

LB Agentic

17%↓28%

TAU2

45%↓35%

TerminalBench

5.3%↓29%

SciCode

36%↓6%

Language & Instructions

IFBench

58%↓5%

AA-LCR

44%↓18%

Hallucination (HHEM)

14%↑4%

Factual (HHEM)

86%↓4%

LB Language

49%↓24%

LB IF

50%↓1%

Output Speed

Standard Mode

86tok/s↑9

First Output 0.48s

Reasoning Mode

233tok/s↑146

First Output 9.09s

Source:Artificial Analysis LiveBench Vectara HHEM OpenRouter

OpenAI