AI/TLDR

GPT-5.3-Codex-Spark

OpenAI's ultra-fast, real-time coding model on Cerebras silicon

Overview

GPT-5.3-Codex-Spark is OpenAI's ultra-fast, text-only coding model, released on February 12, 2026 as a research preview for ChatGPT Pro subscribers inside the Codex app, the Codex CLI, and the IDE/VS Code extension. OpenAI positions it as a smaller, speed-optimized version of GPT-5.3-Codex built for real-time, in-the-moment coding — making targeted edits, reshaping logic, and refining interfaces with results that feel near-instant rather than for long, multi-step autonomous runs.

The defining feature of Codex-Spark is throughput: it streams more than 1,000 tokens per second when served on ultra-low-latency hardware. That hardware is the Cerebras Wafer-Scale Engine 3 (WSE-3), making Codex-Spark OpenAI's first model deployed on production silicon outside its long-standing Nvidia stack. OpenAI calls this the first milestone in its multi-billion-dollar partnership with Cerebras, with the research preview kept narrow while the two companies ramp datacenter capacity. At launch the model is text-only with a 128k-token context window.

OpenAI frames Codex-Spark as the complement to the full GPT-5.3-Codex rather than a replacement: Spark for rapid iteration and immediate feedback, the larger Codex for deeper reasoning and long-running tasks. OpenAI says Spark produces more capable responses than GPT-5.1-Codex-mini while finishing agentic-engineering tasks (evaluated on SWE-Bench Pro and Terminal-Bench 2.0) in a fraction of the time of the full Codex, but did not publish absolute benchmark percentages for the Spark variant, and noted that for heavy multi-step reasoning the larger Codex models still win on absolute quality.

Released2026-02-12
LicenseProprietary (OpenAI). Closed model, no public weights; access via ChatGPT Pro and limited design-partner API.
WeightsAPI only
ParametersNot disclosed (OpenAI describes it only as "a smaller version of GPT-5.3-Codex")
Context128K tokens
Max outputNot disclosed by OpenAI
ArchitectureA smaller, speed-optimized variant of GPT-5.3-Codex, served on the Cerebras Wafer-Scale Engine 3 (WSE-3) — OpenAI's first production model deployment on non-Nvidia silicon. The WSE-3's single-wafer design and on-chip SRAM let Spark stream output at over 1,000 tokens per second. OpenAI has not disclosed parameter count, layer config, or training details beyond it being a smaller distilled-down sibling of GPT-5.3-Codex.
Knowledge cutoffNot officially published for the Spark variant
Modalitiestext
StatusResearch preview (launched February 12, 2026 for ChatGPT Pro). During OpenAI's June 2026 Codex deprecations — which retired GPT-5.2-Codex and the full GPT-5.3-Codex — Codex-Spark was explicitly exempted and remained available rather than being sunset.

Strengths

  • Extreme generation speed — over 1,000 tokens per second, roughly an order of magnitude faster than the full GPT-5.3-Codex
  • Near-instant feedback loop ideal for iterative, interactive coding (targeted edits, refactors, UI tweaks)
  • First OpenAI model on Cerebras WSE-3, sidestepping GPU memory-wall and interconnect latency
  • 128k context window, enough for active files and immediate project dependencies
  • More capable than GPT-5.1-Codex-mini per OpenAI, while being far faster than full Codex
  • Integrated directly into the Codex app, CLI, and IDE extension for ChatGPT Pro users

Best for

  • Rapid prototyping and quick iteration on code where latency matters
  • Targeted, in-the-moment edits — reshaping logic or refining a function and seeing results immediately
  • Interactive frontend/UI loops where you tweak and re-run constantly
  • Pair-programming-style real-time collaboration inside the Codex CLI or IDE
  • Fast subtasks where you'd otherwise wait on a slower frontier model
  • Demos and live coding where near-instant output keeps the flow going

How to access

ProviderModel ID
OpenAI ↗gpt-5.3-codex-spark

Codex — every version

The full lineage of the Codex line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
GPT-5.3-Codexcurrent2026-02-05Proprietary
GPT-5.3-Codex-Spark2026-02-12Proprietary
GPT-5.2-Codex2025-12-11Proprietary
GPT-5.1-Codex2025-11-19Proprietary
GPT-5-Codex2025-09Proprietary

FAQ

What is GPT-5.3-Codex-Spark?

It is OpenAI's ultra-fast, text-only coding model, released February 12, 2026 as a research preview for ChatGPT Pro users. It is a smaller, speed-optimized version of GPT-5.3-Codex, built for real-time coding — quick edits and iteration with near-instant output rather than long autonomous runs.

How fast is GPT-5.3-Codex-Spark?

OpenAI says it streams more than 1,000 tokens per second when served on ultra-low-latency hardware, roughly an order of magnitude faster than the full GPT-5.3-Codex. That speed comes from running on the Cerebras Wafer-Scale Engine 3 (WSE-3).

Why does it run on Cerebras instead of Nvidia?

Codex-Spark is OpenAI's first model deployed on production silicon outside its long-standing Nvidia stack, marking the first milestone in OpenAI's partnership with Cerebras. The WSE-3's single-wafer design with on-chip memory enables the very low latency and high throughput Spark needs for real-time coding.

Can I use GPT-5.3-Codex-Spark in the API, and is it still available?

At launch it was a research preview for ChatGPT Pro users in the Codex app, CLI, and IDE extension, with only limited API access for design partners — OpenAI did not publish a per-token API price. Unlike the full GPT-5.3-Codex, which OpenAI began retiring in mid-2026, Codex-Spark was exempted from those deprecations and remained available.