Overview
gpt-oss-120b is OpenAI's larger open-weight language model, released August 5, 2025 alongside the smaller gpt-oss-20b. It is a 116.8B-parameter Mixture-of-Experts transformer that activates only about 5.1B parameters per token, and it ships under the permissive Apache 2.0 license — the first open-weight models OpenAI had released since GPT-2. The weights are freely downloadable from Hugging Face and the openai/gpt-oss GitHub repository.
Built for high-end reasoning and agentic work, gpt-oss-120b exposes a configurable reasoning effort (low / medium / high) with full chain-of-thought visibility, and natively supports function calling, web browsing, Python tool use, and structured outputs. OpenAI positions it as reaching near-parity with its proprietary o4-mini model on core reasoning benchmarks while running on a single 80GB GPU thanks to native MXFP4 quantization of the expert weights.
Because gpt-oss-120b is open-weight rather than served first-party by OpenAI, it is hosted across many inference providers (OpenRouter, Fireworks, Together, DeepInfra, Novita, Groq, AWS Bedrock, Databricks, and others), each setting its own price. It is text-only, trained primarily on English STEM, coding, and general-knowledge data, with a June 2024 knowledge cutoff and a 131,072-token context window.
| Released | 2025-08-05 |
|---|---|
| License | Apache 2.0 |
| Weights | Open weights |
| Parameters | 116.8B total / 5.1B active (MoE) |
| Context | 131K |
| Max output | 131K |
| Architecture | Mixture-of-Experts transformer with 36 layers, 128 experts (top-4 active per token), and roughly 5.1B active parameters per token out of 116.8B total. Uses Grouped Query Attention (64 query heads, 8 key-value heads), alternating banded-window and dense attention, rotary position embeddings extended via YaRN to a 131,072-token context, and native MXFP4 quantization of the MoE weights so the model fits on a single 80GB GPU (NVIDIA H100 or AMD MI300X). Supports configurable reasoning effort (low / medium / high) and is trained for OpenAI's "harmony" response format. Text-only. |
| Knowledge cutoff | June 2024 |
| Modalities | Text |
| Status | Available |
Benchmarks
- MMLU90%
- GPQA Diamond (no tools)80.1%
- AIME 2024 (with tools)96.6%
- AIME 2025 (with tools)97.9%
- SWE-bench Verified62.4%
- HealthBench57.6%
- Humanity's Last Exam (with tools)19%
- Tau-Bench Retail67.8%
- Artificial Analysis Intelligence Index (high)24%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.039 / 1M tokens per 1M tokens |
|---|---|
| Output | $0.18 / 1M tokens per 1M tokens |
gpt-oss-120b is open-weight, so it is not served first-party by OpenAI — prices vary by host. Figures shown are a representative low rate from OpenRouter providers (e.g. DeepInfra); other providers charge more. Self-hosting incurs only your own compute cost.
Strengths
- State-of-the-art open-weight reasoning — near-parity with OpenAI o4-mini on core benchmarks
- Efficient MoE design: ~5.1B active params runs on a single 80GB GPU via native MXFP4 quantization
- Permissive Apache 2.0 license allowing commercial use and fine-tuning
- Configurable reasoning effort (low/medium/high) with full chain-of-thought access
- Strong agentic tooling: function calling, web browsing, Python execution, structured outputs
- 131K-token context window for long documents and agent traces
- Self-hostable open weights — no vendor lock-in, deployable on-prem
Best for
- On-premise / private reasoning assistants where data cannot leave the org
- Agentic workflows requiring tool calling, browsing, and code execution
- Cost-controlled high-volume inference via self-hosting or cheap third-party providers
- Math, science, and competition-coding tasks (strong AIME and Codeforces results)
- Fine-tuning a frontier-class open model for domain-specific applications
- Research into chain-of-thought reasoning with fully visible reasoning traces
How to access
| Provider | Model ID |
|---|---|
| OpenRouter ↗ | openai/gpt-oss-120b |
| Hugging Face ↗ | openai/gpt-oss-120b |
| Fireworks AI ↗ | accounts/fireworks/models/gpt-oss-120b |
| NVIDIA NIM ↗ | openai/gpt-oss-120b |
| AWS Bedrock ↗ | openai.gpt-oss-120b |
gpt-oss (Open Weight) — every version
The full lineage of the gpt-oss (Open Weight) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| gpt-oss-120bcurrent | 2025-08-05 | — | Apache-2.0 |
| gpt-oss-20b | 2025-08-05 | — | Apache-2.0 |
FAQ
Is gpt-oss-120b free and open source?
The weights are released under the permissive Apache 2.0 license, so you can download, run, fine-tune, and deploy gpt-oss-120b commercially without copyleft restrictions. Running it still costs compute — either your own hardware or a paid inference provider — but there are no license fees.
What hardware do I need to run gpt-oss-120b?
Thanks to native MXFP4 quantization of its Mixture-of-Experts weights, gpt-oss-120b fits on a single 80GB GPU such as an NVIDIA H100 or AMD MI300X. It has 116.8B total parameters but activates only about 5.1B per token.
How does gpt-oss-120b compare to OpenAI's o4-mini?
OpenAI reports gpt-oss-120b reaches near-parity with the proprietary o4-mini on core reasoning benchmarks, and it matches or exceeds o4-mini on competition math (AIME), health questions (HealthBench), and tool calling, while being fully open-weight and self-hostable.
Is gpt-oss-120b multimodal?
No. gpt-oss-120b is text-only — it does not accept image, audio, or video input. It supports a 131,072-token context window, configurable reasoning effort, and native tool use including function calling, browsing, and Python execution.