gpt-oss-120b

Name: gpt-oss-120b
Author: OpenAI

OpenAI's open-weight 117B MoE reasoning model under Apache 2.0

Overview

gpt-oss-120b is OpenAI's larger open-weight language model, released August 5, 2025 alongside the smaller gpt-oss-20b. It is a 116.8B-parameter Mixture-of-Experts transformer that activates only about 5.1B parameters per token, and it ships under the permissive Apache 2.0 license — the first open-weight models OpenAI had released since GPT-2. The weights are freely downloadable from Hugging Face and the openai/gpt-oss GitHub repository.

Built for high-end reasoning and agentic work, gpt-oss-120b exposes a configurable reasoning effort (low / medium / high) with full chain-of-thought visibility, and natively supports function calling, web browsing, Python tool use, and structured outputs. OpenAI positions it as reaching near-parity with its proprietary o4-mini model on core reasoning benchmarks while running on a single 80GB GPU thanks to native MXFP4 quantization of the expert weights.

Because gpt-oss-120b is open-weight rather than served first-party by OpenAI, it is hosted across many inference providers (OpenRouter, Fireworks, Together, DeepInfra, Novita, Groq, AWS Bedrock, Databricks, and others), each setting its own price. It is text-only, trained primarily on English STEM, coding, and general-knowledge data, with a June 2024 knowledge cutoff and a 131,072-token context window.

Released	2025-08-05
License	Apache 2.0
Weights	Open weights
Parameters	116.8B total / 5.1B active (MoE)
Context	131K
Max output	131K
Architecture	Mixture-of-Experts transformer with 36 layers, 128 experts (top-4 active per token), and roughly 5.1B active parameters per token out of 116.8B total. Uses Grouped Query Attention (64 query heads, 8 key-value heads), alternating banded-window and dense attention, rotary position embeddings extended via YaRN to a 131,072-token context, and native MXFP4 quantization of the MoE weights so the model fits on a single 80GB GPU (NVIDIA H100 or AMD MI300X). Supports configurable reasoning effort (low / medium / high) and is trained for OpenAI's "harmony" response format. Text-only.
Knowledge cutoff	June 2024
Modalities	Text
Status	Available

Benchmarks

MMLU90%
GPQA Diamond (no tools)80.1%
AIME 2024 (with tools)96.6%
AIME 2025 (with tools)97.9%
SWE-bench Verified62.4%
HealthBench57.6%
Humanity's Last Exam (with tools)19%
Tau-Bench Retail67.8%
Artificial Analysis Intelligence Index (high)24%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.039 / 1M tokens per 1M tokens
Output	$0.18 / 1M tokens per 1M tokens

gpt-oss-120b is open-weight, so it is not served first-party by OpenAI — prices vary by host. Figures shown are a representative low rate from OpenRouter providers (e.g. DeepInfra); other providers charge more. Self-hosting incurs only your own compute cost.

Pricing source ↗

Strengths

State-of-the-art open-weight reasoning — near-parity with OpenAI o4-mini on core benchmarks
Efficient MoE design: ~5.1B active params runs on a single 80GB GPU via native MXFP4 quantization
Permissive Apache 2.0 license allowing commercial use and fine-tuning
Configurable reasoning effort (low/medium/high) with full chain-of-thought access
Strong agentic tooling: function calling, web browsing, Python execution, structured outputs
131K-token context window for long documents and agent traces
Self-hostable open weights — no vendor lock-in, deployable on-prem

Best for

On-premise / private reasoning assistants where data cannot leave the org
Agentic workflows requiring tool calling, browsing, and code execution
Cost-controlled high-volume inference via self-hosting or cheap third-party providers
Math, science, and competition-coding tasks (strong AIME and Codeforces results)
Fine-tuning a frontier-class open model for domain-specific applications
Research into chain-of-thought reasoning with fully visible reasoning traces

How to access

Provider	Model ID
OpenRouter ↗	`openai/gpt-oss-120b`
Hugging Face ↗	`openai/gpt-oss-120b`
Fireworks AI ↗	`accounts/fireworks/models/gpt-oss-120b`
NVIDIA NIM ↗	`openai/gpt-oss-120b`
AWS Bedrock ↗	`openai.gpt-oss-120b`

gpt-oss (Open Weight) — every version

The full lineage of the gpt-oss (Open Weight) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
gpt-oss-120bcurrent	2025-08-05	—	Apache-2.0
gpt-oss-20b	2025-08-05	—	Apache-2.0

FAQ

Is gpt-oss-120b free and open source?

The weights are released under the permissive Apache 2.0 license, so you can download, run, fine-tune, and deploy gpt-oss-120b commercially without copyleft restrictions. Running it still costs compute — either your own hardware or a paid inference provider — but there are no license fees.

What hardware do I need to run gpt-oss-120b?

Thanks to native MXFP4 quantization of its Mixture-of-Experts weights, gpt-oss-120b fits on a single 80GB GPU such as an NVIDIA H100 or AMD MI300X. It has 116.8B total parameters but activates only about 5.1B per token.

How does gpt-oss-120b compare to OpenAI's o4-mini?

OpenAI reports gpt-oss-120b reaches near-parity with the proprietary o4-mini on core reasoning benchmarks, and it matches or exceeds o4-mini on competition math (AIME), health questions (HealthBench), and tool calling, while being fully open-weight and self-hostable.

Is gpt-oss-120b multimodal?

No. gpt-oss-120b is text-only — it does not accept image, audio, or video input. It supports a 131,072-token context window, configurable reasoning effort, and native tool use including function calling, browsing, and Python execution.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// gpt-oss (Open Weight) — every version

// FAQ