Overview
GPT-5.3-Codex-Spark is OpenAI's ultra-fast, text-only coding model, released on February 12, 2026 as a research preview for ChatGPT Pro subscribers inside the Codex app, the Codex CLI, and the IDE/VS Code extension. OpenAI positions it as a smaller, speed-optimized version of GPT-5.3-Codex built for real-time, in-the-moment coding — making targeted edits, reshaping logic, and refining interfaces with results that feel near-instant rather than for long, multi-step autonomous runs.
The defining feature of Codex-Spark is throughput: it streams more than 1,000 tokens per second when served on ultra-low-latency hardware. That hardware is the Cerebras Wafer-Scale Engine 3 (WSE-3), making Codex-Spark OpenAI's first model deployed on production silicon outside its long-standing Nvidia stack. OpenAI calls this the first milestone in its multi-billion-dollar partnership with Cerebras, with the research preview kept narrow while the two companies ramp datacenter capacity. At launch the model is text-only with a 128k-token context window.
OpenAI frames Codex-Spark as the complement to the full GPT-5.3-Codex rather than a replacement: Spark for rapid iteration and immediate feedback, the larger Codex for deeper reasoning and long-running tasks. OpenAI says Spark produces more capable responses than GPT-5.1-Codex-mini while finishing agentic-engineering tasks (evaluated on SWE-Bench Pro and Terminal-Bench 2.0) in a fraction of the time of the full Codex, but did not publish absolute benchmark percentages for the Spark variant, and noted that for heavy multi-step reasoning the larger Codex models still win on absolute quality.
| Released | 2026-02-12 |
|---|---|
| License | Proprietary (OpenAI). Closed model, no public weights; access via ChatGPT Pro and limited design-partner API. |
| Weights | API only |
| Parameters | Not disclosed (OpenAI describes it only as "a smaller version of GPT-5.3-Codex") |
| Context | 128K tokens |
| Max output | Not disclosed by OpenAI |
| Architecture | A smaller, speed-optimized variant of GPT-5.3-Codex, served on the Cerebras Wafer-Scale Engine 3 (WSE-3) — OpenAI's first production model deployment on non-Nvidia silicon. The WSE-3's single-wafer design and on-chip SRAM let Spark stream output at over 1,000 tokens per second. OpenAI has not disclosed parameter count, layer config, or training details beyond it being a smaller distilled-down sibling of GPT-5.3-Codex. |
| Knowledge cutoff | Not officially published for the Spark variant |
| Modalities | text |
| Status | Research preview (launched February 12, 2026 for ChatGPT Pro). During OpenAI's June 2026 Codex deprecations — which retired GPT-5.2-Codex and the full GPT-5.3-Codex — Codex-Spark was explicitly exempted and remained available rather than being sunset. |
Strengths
- Extreme generation speed — over 1,000 tokens per second, roughly an order of magnitude faster than the full GPT-5.3-Codex
- Near-instant feedback loop ideal for iterative, interactive coding (targeted edits, refactors, UI tweaks)
- First OpenAI model on Cerebras WSE-3, sidestepping GPU memory-wall and interconnect latency
- 128k context window, enough for active files and immediate project dependencies
- More capable than GPT-5.1-Codex-mini per OpenAI, while being far faster than full Codex
- Integrated directly into the Codex app, CLI, and IDE extension for ChatGPT Pro users
Best for
- Rapid prototyping and quick iteration on code where latency matters
- Targeted, in-the-moment edits — reshaping logic or refining a function and seeing results immediately
- Interactive frontend/UI loops where you tweak and re-run constantly
- Pair-programming-style real-time collaboration inside the Codex CLI or IDE
- Fast subtasks where you'd otherwise wait on a slower frontier model
- Demos and live coding where near-instant output keeps the flow going
How to access
| Provider | Model ID |
|---|---|
| OpenAI ↗ | gpt-5.3-codex-spark |
Codex — every version
The full lineage of the Codex line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| GPT-5.3-Codexcurrent | 2026-02-05 | — | Proprietary |
| GPT-5.3-Codex-Spark | 2026-02-12 | — | Proprietary |
| GPT-5.2-Codex | 2025-12-11 | — | Proprietary |
| GPT-5.1-Codex | 2025-11-19 | — | Proprietary |
| GPT-5-Codex | 2025-09 | — | Proprietary |
FAQ
What is GPT-5.3-Codex-Spark?
It is OpenAI's ultra-fast, text-only coding model, released February 12, 2026 as a research preview for ChatGPT Pro users. It is a smaller, speed-optimized version of GPT-5.3-Codex, built for real-time coding — quick edits and iteration with near-instant output rather than long autonomous runs.
How fast is GPT-5.3-Codex-Spark?
OpenAI says it streams more than 1,000 tokens per second when served on ultra-low-latency hardware, roughly an order of magnitude faster than the full GPT-5.3-Codex. That speed comes from running on the Cerebras Wafer-Scale Engine 3 (WSE-3).
Why does it run on Cerebras instead of Nvidia?
Codex-Spark is OpenAI's first model deployed on production silicon outside its long-standing Nvidia stack, marking the first milestone in OpenAI's partnership with Cerebras. The WSE-3's single-wafer design with on-chip memory enables the very low latency and high throughput Spark needs for real-time coding.
Can I use GPT-5.3-Codex-Spark in the API, and is it still available?
At launch it was a research preview for ChatGPT Pro users in the Codex app, CLI, and IDE extension, with only limited API access for design partners — OpenAI did not publish a per-token API price. Unlike the full GPT-5.3-Codex, which OpenAI began retiring in mid-2026, Codex-Spark was exempted from those deprecations and remained available.