Overview
Grok 4 Fast is xAI's cost-optimized member of the Grok 4 family, released on September 19, 2025. It is designed to deliver close-to-frontier quality at a fraction of the price of the full Grok 4 model, pairing aggressive token efficiency with a very large 2 million-token context window. xAI markets it as its most cost-efficient model, aimed at high-volume production workloads, search, and agent loops where price per answer matters.
What sets Grok 4 Fast apart architecturally is its unified design: a single set of weights serves both a reasoning mode (extended chain-of-thought) and a fast non-reasoning mode, exposed through the grok-4-fast-reasoning and grok-4-fast-non-reasoning endpoints. xAI reports that Grok 4 Fast reaches Grok 4-level scores on several benchmarks while spending roughly 40% fewer thinking tokens, which combined with low per-token rates makes it dramatically cheaper to run.
Grok 4 Fast accepts text and image inputs and returns text, with native tool use including web and X (Twitter) search and link-following for grounded, up-to-date answers. API pricing starts at $0.20 per million input tokens and $0.50 per million output tokens, with cached input at $0.05 per million, placing it among the cheapest frontier-adjacent APIs available at launch.
| Released | 2025-09 |
|---|---|
| License | Proprietary |
| Weights | API only |
| Parameters | Undisclosed |
| Context | 2M |
| Architecture | A single-weights model from xAI that serves both reasoning and non-reasoning modes from one checkpoint, so the same model can answer quickly or think step-by-step depending on the request. xAI positions it as delivering Grok 4-class quality while using roughly 40% fewer thinking tokens, which is what makes its per-answer cost so low. It supports native tool use, including web and X search. |
| Knowledge cutoff | Not publicly disclosed |
| Modalities | Text, Vision |
| Status | Available |
Benchmarks
- GPQA Diamond85.7%
- AIME 2025 (no tools)92%
- HMMT 2025 (no tools)93.3%
- LiveCodeBench (Jan-May)80%
- Humanity's Last Exam (no tools)20%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.20 / 1M tokens per 1M tokens |
|---|---|
| Cached input | $0.05 / 1M tokens per 1M tokens |
| Output | $0.50 / 1M tokens per 1M tokens |
Base rates for context up to 128K tokens; a higher tier applies above 128K. Verify current rates in the xAI console.
Strengths
- Very large 2 million-token context window for long documents, codebases, and multi-step agent histories
- Low API cost ($0.20 input / $0.50 output per million tokens, $0.05 cached) with strong benchmark scores
- Unified single-weights design exposes both fast and deep-reasoning modes without switching models
- High token efficiency — roughly 40% fewer thinking tokens than Grok 4 for comparable quality
- Native tool use with built-in web and X search for grounded, current answers
- Multimodal text + image input
Best for
- High-volume production tasks where cost per answer is the deciding factor
- Long-context work: analyzing large documents, transcripts, or entire codebases within the 2M window
- Agentic search and research loops that benefit from native web and X search
- Real-time coding assistance and competitive-math-style reasoning
- Latency- and budget-sensitive chat and summarization at scale
- Multimodal tasks combining text and image inputs
How to access
| Provider | Model ID |
|---|---|
| xAI ↗ | grok-4-fast-reasoning |
| xAI ↗ | grok-4-fast-non-reasoning |
| OpenRouter ↗ | x-ai/grok-4-fast |
Grok Fast — every version
The full lineage of the Grok Fast line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| Grok 4.1 Fastcurrent | 2025-11-19 | — | Proprietary |
| Grok 4 Fast | 2025-09 | — | Proprietary |
FAQ
When was Grok 4 Fast released?
xAI released Grok 4 Fast on September 19, 2025, as the cost-optimized member of the Grok 4 family.
How big is Grok 4 Fast's context window?
Grok 4 Fast supports a context window of up to 2 million tokens, large enough for long documents, full codebases, and extended agent histories.
How much does the Grok 4 Fast API cost?
Base pricing is $0.20 per million input tokens and $0.50 per million output tokens, with cached input at $0.05 per million. A higher tier applies for context above 128K tokens; check the xAI console for current rates.
What is the difference between the reasoning and non-reasoning modes?
Grok 4 Fast uses a single set of weights that serves both modes. The reasoning endpoint (grok-4-fast-reasoning) does extended chain-of-thought for harder problems, while the non-reasoning endpoint (grok-4-fast-non-reasoning) answers quickly for simpler tasks.