AI/TLDR

Grok 4 Fast

xAI's cost-efficient Grok 4 with a 2M-token context window.

Overview

Grok 4 Fast is xAI's cost-optimized member of the Grok 4 family, released on September 19, 2025. It is designed to deliver close-to-frontier quality at a fraction of the price of the full Grok 4 model, pairing aggressive token efficiency with a very large 2 million-token context window. xAI markets it as its most cost-efficient model, aimed at high-volume production workloads, search, and agent loops where price per answer matters.

What sets Grok 4 Fast apart architecturally is its unified design: a single set of weights serves both a reasoning mode (extended chain-of-thought) and a fast non-reasoning mode, exposed through the grok-4-fast-reasoning and grok-4-fast-non-reasoning endpoints. xAI reports that Grok 4 Fast reaches Grok 4-level scores on several benchmarks while spending roughly 40% fewer thinking tokens, which combined with low per-token rates makes it dramatically cheaper to run.

Grok 4 Fast accepts text and image inputs and returns text, with native tool use including web and X (Twitter) search and link-following for grounded, up-to-date answers. API pricing starts at $0.20 per million input tokens and $0.50 per million output tokens, with cached input at $0.05 per million, placing it among the cheapest frontier-adjacent APIs available at launch.

Released2025-09
LicenseProprietary
WeightsAPI only
ParametersUndisclosed
Context2M
ArchitectureA single-weights model from xAI that serves both reasoning and non-reasoning modes from one checkpoint, so the same model can answer quickly or think step-by-step depending on the request. xAI positions it as delivering Grok 4-class quality while using roughly 40% fewer thinking tokens, which is what makes its per-answer cost so low. It supports native tool use, including web and X search.
Knowledge cutoffNot publicly disclosed
ModalitiesText, Vision
StatusAvailable

Benchmarks

  1. GPQA Diamond85.7%
  2. AIME 2025 (no tools)92%
  3. HMMT 2025 (no tools)93.3%
  4. LiveCodeBench (Jan-May)80%
  5. Humanity's Last Exam (no tools)20%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.20 / 1M tokens per 1M tokens
Cached input$0.05 / 1M tokens per 1M tokens
Output$0.50 / 1M tokens per 1M tokens

Base rates for context up to 128K tokens; a higher tier applies above 128K. Verify current rates in the xAI console.

Pricing source ↗

Strengths

  • Very large 2 million-token context window for long documents, codebases, and multi-step agent histories
  • Low API cost ($0.20 input / $0.50 output per million tokens, $0.05 cached) with strong benchmark scores
  • Unified single-weights design exposes both fast and deep-reasoning modes without switching models
  • High token efficiency — roughly 40% fewer thinking tokens than Grok 4 for comparable quality
  • Native tool use with built-in web and X search for grounded, current answers
  • Multimodal text + image input

Best for

  • High-volume production tasks where cost per answer is the deciding factor
  • Long-context work: analyzing large documents, transcripts, or entire codebases within the 2M window
  • Agentic search and research loops that benefit from native web and X search
  • Real-time coding assistance and competitive-math-style reasoning
  • Latency- and budget-sensitive chat and summarization at scale
  • Multimodal tasks combining text and image inputs

How to access

ProviderModel ID
xAI ↗grok-4-fast-reasoning
xAI ↗grok-4-fast-non-reasoning
OpenRouter ↗x-ai/grok-4-fast

Grok Fast — every version

The full lineage of the Grok Fast line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Grok 4.1 Fastcurrent2025-11-19Proprietary
Grok 4 Fast2025-09Proprietary

FAQ

When was Grok 4 Fast released?

xAI released Grok 4 Fast on September 19, 2025, as the cost-optimized member of the Grok 4 family.

How big is Grok 4 Fast's context window?

Grok 4 Fast supports a context window of up to 2 million tokens, large enough for long documents, full codebases, and extended agent histories.

How much does the Grok 4 Fast API cost?

Base pricing is $0.20 per million input tokens and $0.50 per million output tokens, with cached input at $0.05 per million. A higher tier applies for context above 128K tokens; check the xAI console for current rates.

What is the difference between the reasoning and non-reasoning modes?

Grok 4 Fast uses a single set of weights that serves both modes. The reasoning endpoint (grok-4-fast-reasoning) does extended chain-of-thought for harder problems, while the non-reasoning endpoint (grok-4-fast-non-reasoning) answers quickly for simpler tasks.