Grok 4 Fast

xAI's cost-efficient Grok 4 with a 2M-token context window.

Overview

Grok 4 Fast is xAI's cost-optimized member of the Grok 4 family, released on September 19, 2025. It is designed to deliver close-to-frontier quality at a fraction of the price of the full Grok 4 model, pairing aggressive token efficiency with a very large 2 million-token context window. xAI markets it as its most cost-efficient model, aimed at high-volume production workloads, search, and agent loops where price per answer matters.

What sets Grok 4 Fast apart architecturally is its unified design: a single set of weights serves both a reasoning mode (extended chain-of-thought) and a fast non-reasoning mode, exposed through the grok-4-fast-reasoning and grok-4-fast-non-reasoning endpoints. xAI reports that Grok 4 Fast reaches Grok 4-level scores on several benchmarks while spending roughly 40% fewer thinking tokens, which combined with low per-token rates makes it dramatically cheaper to run.

Grok 4 Fast accepts text and image inputs and returns text, with native tool use including web and X (Twitter) search and link-following for grounded, up-to-date answers. API pricing starts at $0.20 per million input tokens and $0.50 per million output tokens, with cached input at $0.05 per million, placing it among the cheapest frontier-adjacent APIs available at launch.

Released	2025-09
License	Proprietary
Weights	API only
Parameters	Undisclosed
Context	2M
Architecture	A single-weights model from xAI that serves both reasoning and non-reasoning modes from one checkpoint, so the same model can answer quickly or think step-by-step depending on the request. xAI positions it as delivering Grok 4-class quality while using roughly 40% fewer thinking tokens, which is what makes its per-answer cost so low. It supports native tool use, including web and X search.
Knowledge cutoff	Not publicly disclosed
Modalities	Text, Vision
Status	Available

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.20 / 1M tokens per 1M tokens
Cached input	$0.05 / 1M tokens per 1M tokens
Output	$0.50 / 1M tokens per 1M tokens

Base rates for context up to 128K tokens; a higher tier applies above 128K. Verify current rates in the xAI console.

Pricing source ↗

Strengths

Very large 2 million-token context window for long documents, codebases, and multi-step agent histories
Low API cost ($0.20 input / $0.50 output per million tokens, $0.05 cached) with strong benchmark scores
Unified single-weights design exposes both fast and deep-reasoning modes without switching models
High token efficiency — roughly 40% fewer thinking tokens than Grok 4 for comparable quality
Native tool use with built-in web and X search for grounded, current answers
Multimodal text + image input

Best for

High-volume production tasks where cost per answer is the deciding factor
Long-context work: analyzing large documents, transcripts, or entire codebases within the 2M window
Agentic search and research loops that benefit from native web and X search
Real-time coding assistance and competitive-math-style reasoning
Latency- and budget-sensitive chat and summarization at scale
Multimodal tasks combining text and image inputs

How to access

Provider	Model ID
xAI ↗	`grok-4-fast-reasoning`
xAI ↗	`grok-4-fast-non-reasoning`
OpenRouter ↗	`x-ai/grok-4-fast`

Grok Fast — every version

The full lineage of the Grok Fast line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Grok 4.1 Fastcurrent	2025-11-19	—	Proprietary
Grok 4 Fast	2025-09	—	Proprietary

FAQ

When was Grok 4 Fast released?

xAI released Grok 4 Fast on September 19, 2025, as the cost-optimized member of the Grok 4 family.

How big is Grok 4 Fast's context window?

Grok 4 Fast supports a context window of up to 2 million tokens, large enough for long documents, full codebases, and extended agent histories.

How much does the Grok 4 Fast API cost?

Base pricing is $0.20 per million input tokens and $0.50 per million output tokens, with cached input at $0.05 per million. A higher tier applies for context above 128K tokens; check the xAI console for current rates.

What is the difference between the reasoning and non-reasoning modes?

Grok 4 Fast uses a single set of weights that serves both modes. The reasoning endpoint (grok-4-fast-reasoning) does extended chain-of-thought for harder problems, while the non-reasoning endpoint (grok-4-fast-non-reasoning) answers quickly for simpler tasks.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// Grok Fast — every version

// FAQ