Grok 4.1 Fast

xAI's agentic tool-calling model: 2M-token context, a low-latency non-reasoning mode and a deeper reasoning mode, shipped with the Agent Tools API.

Overview

Grok 4.1 Fast is a large language model from xAI, part of the Grok Fast line and released on November 19, 2025 alongside the Agent Tools API. It is positioned as xAI's best tool-calling and agentic model — built for high-volume, real-world tasks like customer support and deep research rather than as the absolute top "brain" (that role belongs to Grok 4 / Grok 4 Heavy). It is tuned from the Grok 4.1 base, which itself emphasized lower hallucination and stronger conversational quality.

The model carries a 2-million-token context window and accepts both text and images, returning text. It is offered as a single model in two modes: a low-latency non-reasoning mode (model ID grok-4-1-fast-non-reasoning) for instant replies, and a reasoning mode (grok-4-1-fast-reasoning) that emits thinking tokens for multi-step problems. The mode is selected through a reasoning parameter in the API, and pricing is the same either way.

xAI trained Grok 4.1 Fast with heavy reinforcement learning across many simulated tool-use environments, which is reflected in its agentic benchmark results. At launch xAI made the model and the bundled Agent Tools API free for a two-week window (through December 3, 2025) via partners such as OpenRouter, before settling into low per-token pricing.

Released	2025-11-19
License	Proprietary
Weights	API only
Context	2M
Architecture	Proprietary transformer trained with large-scale reinforcement learning on tool use across many simulated environments, tuned from the Grok 4.1 base. Exposes one model in two modes — a low-latency non-reasoning mode and a reasoning mode that emits thinking tokens — toggled by a reasoning parameter rather than separate weights.
Modalities	Text, Vision
Status	Available

Benchmarks

Agentic search benchmark comparison from xAI's Grok 4.1 Fast launch post: Grok 4.1 Fast (with Agent Tools API) vs GPT-5, Claude Sonnet 4.5, and Gemini 3 Pro on Reka Research-Eval, FRAMES, and X Browse (Score and Avg. Cost per query).

Benchmark	Grok 4.1 Fast	GPT-5	Claude Sonnet 4.5	Gemini 3 Pro
Reka Research-Eval (Score)	63.9%	45.5%	41.2%	55.9%
Reka Research-Eval (Avg. Cost)	0.046 $	0.107 $	0.065 $	—
FRAMES (Score)	87.6%	86%	85%	90.9%
FRAMES (Avg. Cost)	0.048 $	0.058 $	0.078 $	—
X Browse (Score)	56.3%	24.2%	14.6%	26.5%
X Browse (Avg. Cost)	0.091 $	0.198 $	0.126 $	—

Comparison source ↗

This model's scores

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.20 / 1M tokens per 1M tokens
Cached input	$0.05 / 1M tokens per 1M tokens
Output	$0.50 / 1M tokens per 1M tokens

Same price for reasoning and non-reasoning modes. The bundled Agent Tools API is billed separately at no more than $5 per 1,000 successful tool calls. Model and Agent Tools API were free through Dec 3, 2025 at launch.

Pricing source ↗

Strengths

Frontier-level tool calling and agentic workflows — top results on τ²-bench Telecom and Berkeley Function Calling v4
2M-token context window that holds up on long-context retrieval far better than Grok 4
Two modes from one model: pick instant non-reasoning replies or deeper reasoning via a single API parameter
Roughly half the hallucination rate of the earlier Grok 4 Fast, at similar task accuracy
Very low, predictable pricing ($0.20/$0.50 per million tokens) with cheap cached input for high-volume agents
Ships with a bundled Agent Tools API (web/code/file tools) capped at no more than $5 per 1,000 successful tool calls

Best for

Customer-support and helpdesk agents that chain many tool calls
Deep-research assistants that fan out web search and synthesize long contexts
Multi-step agentic automation and orchestration over external tools/APIs
High-volume API workloads where latency and per-token cost matter
Long-document analysis and retrieval over codebases or large corpora (up to 2M tokens)
Vision-assisted agent steps that read screenshots, charts, or scanned pages (JPG/PNG)

How to access

Provider	Model ID
xAI API ↗	`grok-4-1-fast-reasoning / grok-4-1-fast-non-reasoning`
OpenRouter ↗	`x-ai/grok-4.1-fast`
Oracle Cloud (OCI Generative AI) ↗	`xai.grok-4-1-fast-reasoning / xai.grok-4-1-fast-non-reasoning`

Grok Fast — every version

The full lineage of the Grok Fast line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Grok 4.1 Fastcurrent	2025-11-19	—	Proprietary
Grok 4 Fast	2025-09	—	Proprietary

FAQ

What is Grok 4.1 Fast?

Grok 4.1 Fast is xAI's agentic, tool-calling LLM released November 19, 2025. It is tuned for high-volume, real-world tasks like customer support and deep research, carries a 2M-token context window, and shipped together with xAI's Agent Tools API.

What is the difference between the reasoning and non-reasoning modes?

It is one model offered in two modes selected by an API parameter. The non-reasoning mode (grok-4-1-fast-non-reasoning) gives instant, low-latency replies; the reasoning mode (grok-4-1-fast-reasoning) emits thinking tokens for multi-step problems. Per-token pricing is the same for both.

How much does Grok 4.1 Fast cost?

Per xAI, it is $0.20 per million input tokens, $0.05 per million cached input tokens, and $0.50 per million output tokens. The bundled Agent Tools API is billed separately at no more than $5 per 1,000 successful tool calls. Both were free through December 3, 2025 at launch.

How good is Grok 4.1 Fast at tool calling?

xAI reports it scores 100% on τ²-bench Telecom and 72% on the Berkeley Function Calling Leaderboard v4, beating models like Claude Sonnet 4.5, GPT-5.1, and Grok 4 on those agentic benchmarks, and it sustains long-context quality (67%) far above Grok 4 (22%).

// Overview

// Benchmarks

This model's scores

// Pricing

// Strengths

// Best for

// How to access

// Grok Fast — every version

// FAQ