AI/TLDR

gpt-oss-120b

OpenAI's open-weight 117B MoE reasoning model under Apache 2.0

Overview

gpt-oss-120b is OpenAI's larger open-weight language model, released August 5, 2025 alongside the smaller gpt-oss-20b. It is a 116.8B-parameter Mixture-of-Experts transformer that activates only about 5.1B parameters per token, and it ships under the permissive Apache 2.0 license — the first open-weight models OpenAI had released since GPT-2. The weights are freely downloadable from Hugging Face and the openai/gpt-oss GitHub repository.

Built for high-end reasoning and agentic work, gpt-oss-120b exposes a configurable reasoning effort (low / medium / high) with full chain-of-thought visibility, and natively supports function calling, web browsing, Python tool use, and structured outputs. OpenAI positions it as reaching near-parity with its proprietary o4-mini model on core reasoning benchmarks while running on a single 80GB GPU thanks to native MXFP4 quantization of the expert weights.

Because gpt-oss-120b is open-weight rather than served first-party by OpenAI, it is hosted across many inference providers (OpenRouter, Fireworks, Together, DeepInfra, Novita, Groq, AWS Bedrock, Databricks, and others), each setting its own price. It is text-only, trained primarily on English STEM, coding, and general-knowledge data, with a June 2024 knowledge cutoff and a 131,072-token context window.

Released2025-08-05
LicenseApache 2.0
WeightsOpen weights
Parameters116.8B total / 5.1B active (MoE)
Context131K
Max output131K
ArchitectureMixture-of-Experts transformer with 36 layers, 128 experts (top-4 active per token), and roughly 5.1B active parameters per token out of 116.8B total. Uses Grouped Query Attention (64 query heads, 8 key-value heads), alternating banded-window and dense attention, rotary position embeddings extended via YaRN to a 131,072-token context, and native MXFP4 quantization of the MoE weights so the model fits on a single 80GB GPU (NVIDIA H100 or AMD MI300X). Supports configurable reasoning effort (low / medium / high) and is trained for OpenAI's "harmony" response format. Text-only.
Knowledge cutoffJune 2024
ModalitiesText
StatusAvailable

Benchmarks

  1. MMLU90%
  2. GPQA Diamond (no tools)80.1%
  3. AIME 2024 (with tools)96.6%
  4. AIME 2025 (with tools)97.9%
  5. SWE-bench Verified62.4%
  6. HealthBench57.6%
  7. Humanity's Last Exam (with tools)19%
  8. Tau-Bench Retail67.8%
  9. Artificial Analysis Intelligence Index (high)24%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.039 / 1M tokens per 1M tokens
Output$0.18 / 1M tokens per 1M tokens

gpt-oss-120b is open-weight, so it is not served first-party by OpenAI — prices vary by host. Figures shown are a representative low rate from OpenRouter providers (e.g. DeepInfra); other providers charge more. Self-hosting incurs only your own compute cost.

Pricing source ↗

Strengths

  • State-of-the-art open-weight reasoning — near-parity with OpenAI o4-mini on core benchmarks
  • Efficient MoE design: ~5.1B active params runs on a single 80GB GPU via native MXFP4 quantization
  • Permissive Apache 2.0 license allowing commercial use and fine-tuning
  • Configurable reasoning effort (low/medium/high) with full chain-of-thought access
  • Strong agentic tooling: function calling, web browsing, Python execution, structured outputs
  • 131K-token context window for long documents and agent traces
  • Self-hostable open weights — no vendor lock-in, deployable on-prem

Best for

  • On-premise / private reasoning assistants where data cannot leave the org
  • Agentic workflows requiring tool calling, browsing, and code execution
  • Cost-controlled high-volume inference via self-hosting or cheap third-party providers
  • Math, science, and competition-coding tasks (strong AIME and Codeforces results)
  • Fine-tuning a frontier-class open model for domain-specific applications
  • Research into chain-of-thought reasoning with fully visible reasoning traces

How to access

ProviderModel ID
OpenRouter ↗openai/gpt-oss-120b
Hugging Face ↗openai/gpt-oss-120b
Fireworks AI ↗accounts/fireworks/models/gpt-oss-120b
NVIDIA NIM ↗openai/gpt-oss-120b
AWS Bedrock ↗openai.gpt-oss-120b

gpt-oss (Open Weight) — every version

The full lineage of the gpt-oss (Open Weight) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
gpt-oss-120bcurrent2025-08-05Apache-2.0
gpt-oss-20b2025-08-05Apache-2.0

FAQ

Is gpt-oss-120b free and open source?

The weights are released under the permissive Apache 2.0 license, so you can download, run, fine-tune, and deploy gpt-oss-120b commercially without copyleft restrictions. Running it still costs compute — either your own hardware or a paid inference provider — but there are no license fees.

What hardware do I need to run gpt-oss-120b?

Thanks to native MXFP4 quantization of its Mixture-of-Experts weights, gpt-oss-120b fits on a single 80GB GPU such as an NVIDIA H100 or AMD MI300X. It has 116.8B total parameters but activates only about 5.1B per token.

How does gpt-oss-120b compare to OpenAI's o4-mini?

OpenAI reports gpt-oss-120b reaches near-parity with the proprietary o4-mini on core reasoning benchmarks, and it matches or exceeds o4-mini on competition math (AIME), health questions (HealthBench), and tool calling, while being fully open-weight and self-hostable.

Is gpt-oss-120b multimodal?

No. gpt-oss-120b is text-only — it does not accept image, audio, or video input. It supports a 131,072-token context window, configurable reasoning effort, and native tool use including function calling, browsing, and Python execution.