GPT-5.3-Codex-Spark

Name: GPT-5.3-Codex-Spark
Author: OpenAI

OpenAI's ultra-fast, real-time coding model on Cerebras silicon

Overview

GPT-5.3-Codex-Spark is OpenAI's ultra-fast, text-only coding model, released on February 12, 2026 as a research preview for ChatGPT Pro subscribers inside the Codex app, the Codex CLI, and the IDE/VS Code extension. OpenAI positions it as a smaller, speed-optimized version of GPT-5.3-Codex built for real-time, in-the-moment coding — making targeted edits, reshaping logic, and refining interfaces with results that feel near-instant rather than for long, multi-step autonomous runs.

The defining feature of Codex-Spark is throughput: it streams more than 1,000 tokens per second when served on ultra-low-latency hardware. That hardware is the Cerebras Wafer-Scale Engine 3 (WSE-3), making Codex-Spark OpenAI's first model deployed on production silicon outside its long-standing Nvidia stack. OpenAI calls this the first milestone in its multi-billion-dollar partnership with Cerebras, with the research preview kept narrow while the two companies ramp datacenter capacity. At launch the model is text-only with a 128k-token context window.

OpenAI frames Codex-Spark as the complement to the full GPT-5.3-Codex rather than a replacement: Spark for rapid iteration and immediate feedback, the larger Codex for deeper reasoning and long-running tasks. OpenAI says Spark produces more capable responses than GPT-5.1-Codex-mini while finishing agentic-engineering tasks (evaluated on SWE-Bench Pro and Terminal-Bench 2.0) in a fraction of the time of the full Codex, but did not publish absolute benchmark percentages for the Spark variant, and noted that for heavy multi-step reasoning the larger Codex models still win on absolute quality.

Released	2026-02-12
License	Proprietary (OpenAI). Closed model, no public weights; access via ChatGPT Pro and limited design-partner API.
Weights	API only
Parameters	Not disclosed (OpenAI describes it only as "a smaller version of GPT-5.3-Codex")
Context	128K tokens
Max output	Not disclosed by OpenAI
Architecture	A smaller, speed-optimized variant of GPT-5.3-Codex, served on the Cerebras Wafer-Scale Engine 3 (WSE-3) — OpenAI's first production model deployment on non-Nvidia silicon. The WSE-3's single-wafer design and on-chip SRAM let Spark stream output at over 1,000 tokens per second. OpenAI has not disclosed parameter count, layer config, or training details beyond it being a smaller distilled-down sibling of GPT-5.3-Codex.
Knowledge cutoff	Not officially published for the Spark variant
Modalities	text
Status	Research preview (launched February 12, 2026 for ChatGPT Pro). During OpenAI's June 2026 Codex deprecations — which retired GPT-5.2-Codex and the full GPT-5.3-Codex — Codex-Spark was explicitly exempted and remained available rather than being sunset.

Strengths

Extreme generation speed — over 1,000 tokens per second, roughly an order of magnitude faster than the full GPT-5.3-Codex
Near-instant feedback loop ideal for iterative, interactive coding (targeted edits, refactors, UI tweaks)
First OpenAI model on Cerebras WSE-3, sidestepping GPU memory-wall and interconnect latency
128k context window, enough for active files and immediate project dependencies
More capable than GPT-5.1-Codex-mini per OpenAI, while being far faster than full Codex
Integrated directly into the Codex app, CLI, and IDE extension for ChatGPT Pro users

Best for

Rapid prototyping and quick iteration on code where latency matters
Targeted, in-the-moment edits — reshaping logic or refining a function and seeing results immediately
Interactive frontend/UI loops where you tweak and re-run constantly
Pair-programming-style real-time collaboration inside the Codex CLI or IDE
Fast subtasks where you'd otherwise wait on a slower frontier model
Demos and live coding where near-instant output keeps the flow going

How to access

Provider	Model ID
OpenAI ↗	`gpt-5.3-codex-spark`

Codex — every version

The full lineage of the Codex line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
GPT-5.3-Codexcurrent	2026-02-05	—	Proprietary
GPT-5.3-Codex-Spark	2026-02-12	—	Proprietary
GPT-5.2-Codex	2025-12-11	—	Proprietary
GPT-5.1-Codex	2025-11-19	—	Proprietary
GPT-5-Codex	2025-09	—	Proprietary

FAQ

What is GPT-5.3-Codex-Spark?

It is OpenAI's ultra-fast, text-only coding model, released February 12, 2026 as a research preview for ChatGPT Pro users. It is a smaller, speed-optimized version of GPT-5.3-Codex, built for real-time coding — quick edits and iteration with near-instant output rather than long autonomous runs.

How fast is GPT-5.3-Codex-Spark?

OpenAI says it streams more than 1,000 tokens per second when served on ultra-low-latency hardware, roughly an order of magnitude faster than the full GPT-5.3-Codex. That speed comes from running on the Cerebras Wafer-Scale Engine 3 (WSE-3).

Why does it run on Cerebras instead of Nvidia?

Codex-Spark is OpenAI's first model deployed on production silicon outside its long-standing Nvidia stack, marking the first milestone in OpenAI's partnership with Cerebras. The WSE-3's single-wafer design with on-chip memory enables the very low latency and high throughput Spark needs for real-time coding.

Can I use GPT-5.3-Codex-Spark in the API, and is it still available?

At launch it was a research preview for ChatGPT Pro users in the Codex app, CLI, and IDE extension, with only limited API access for design partners — OpenAI did not publish a per-token API price. Unlike the full GPT-5.3-Codex, which OpenAI began retiring in mid-2026, Codex-Spark was exempted from those deprecations and remained available.

// Overview

// Strengths

// Best for

// How to access

// Codex — every version

// FAQ

Overview

Strengths

Best for

How to access

Codex — every version

FAQ