Overview
Grok 4.1 Fast is a large language model from xAI, part of the Grok Fast line and released on November 19, 2025 alongside the Agent Tools API. It is positioned as xAI's best tool-calling and agentic model — built for high-volume, real-world tasks like customer support and deep research rather than as the absolute top "brain" (that role belongs to Grok 4 / Grok 4 Heavy). It is tuned from the Grok 4.1 base, which itself emphasized lower hallucination and stronger conversational quality.
The model carries a 2-million-token context window and accepts both text and images, returning text. It is offered as a single model in two modes: a low-latency non-reasoning mode (model ID grok-4-1-fast-non-reasoning) for instant replies, and a reasoning mode (grok-4-1-fast-reasoning) that emits thinking tokens for multi-step problems. The mode is selected through a reasoning parameter in the API, and pricing is the same either way.
xAI trained Grok 4.1 Fast with heavy reinforcement learning across many simulated tool-use environments, which is reflected in its agentic benchmark results. At launch xAI made the model and the bundled Agent Tools API free for a two-week window (through December 3, 2025) via partners such as OpenRouter, before settling into low per-token pricing.
| Released | 2025-11-19 |
|---|---|
| License | Proprietary |
| Weights | API only |
| Context | 2M |
| Architecture | Proprietary transformer trained with large-scale reinforcement learning on tool use across many simulated environments, tuned from the Grok 4.1 base. Exposes one model in two modes — a low-latency non-reasoning mode and a reasoning mode that emits thinking tokens — toggled by a reasoning parameter rather than separate weights. |
| Modalities | Text, Vision |
| Status | Available |
Benchmarks
Agentic search benchmark comparison from xAI's Grok 4.1 Fast launch post: Grok 4.1 Fast (with Agent Tools API) vs GPT-5, Claude Sonnet 4.5, and Gemini 3 Pro on Reka Research-Eval, FRAMES, and X Browse (Score and Avg. Cost per query).
| Benchmark | Grok 4.1 Fast | GPT-5 | Claude Sonnet 4.5 | Gemini 3 Pro |
|---|---|---|---|---|
| Reka Research-Eval (Score) | 63.9% | 45.5% | 41.2% | 55.9% |
| Reka Research-Eval (Avg. Cost) | 0.046 $ | 0.107 $ | 0.065 $ | — |
| FRAMES (Score) | 87.6% | 86% | 85% | 90.9% |
| FRAMES (Avg. Cost) | 0.048 $ | 0.058 $ | 0.078 $ | — |
| X Browse (Score) | 56.3% | 24.2% | 14.6% | 26.5% |
| X Browse (Avg. Cost) | 0.091 $ | 0.198 $ | 0.126 $ | — |
This model's scores
- τ²-bench (Telecom) — agentic tool use100%
- Berkeley Function Calling Leaderboard v4 (BFCL-V4)72%
- Long-context (2M window) retrieval67%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.20 / 1M tokens per 1M tokens |
|---|---|
| Cached input | $0.05 / 1M tokens per 1M tokens |
| Output | $0.50 / 1M tokens per 1M tokens |
Same price for reasoning and non-reasoning modes. The bundled Agent Tools API is billed separately at no more than $5 per 1,000 successful tool calls. Model and Agent Tools API were free through Dec 3, 2025 at launch.
Strengths
- Frontier-level tool calling and agentic workflows — top results on τ²-bench Telecom and Berkeley Function Calling v4
- 2M-token context window that holds up on long-context retrieval far better than Grok 4
- Two modes from one model: pick instant non-reasoning replies or deeper reasoning via a single API parameter
- Roughly half the hallucination rate of the earlier Grok 4 Fast, at similar task accuracy
- Very low, predictable pricing ($0.20/$0.50 per million tokens) with cheap cached input for high-volume agents
- Ships with a bundled Agent Tools API (web/code/file tools) capped at no more than $5 per 1,000 successful tool calls
Best for
- Customer-support and helpdesk agents that chain many tool calls
- Deep-research assistants that fan out web search and synthesize long contexts
- Multi-step agentic automation and orchestration over external tools/APIs
- High-volume API workloads where latency and per-token cost matter
- Long-document analysis and retrieval over codebases or large corpora (up to 2M tokens)
- Vision-assisted agent steps that read screenshots, charts, or scanned pages (JPG/PNG)
How to access
| Provider | Model ID |
|---|---|
| xAI API ↗ | grok-4-1-fast-reasoning / grok-4-1-fast-non-reasoning |
| OpenRouter ↗ | x-ai/grok-4.1-fast |
| Oracle Cloud (OCI Generative AI) ↗ | xai.grok-4-1-fast-reasoning / xai.grok-4-1-fast-non-reasoning |
Grok Fast — every version
The full lineage of the Grok Fast line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| Grok 4.1 Fastcurrent | 2025-11-19 | — | Proprietary |
| Grok 4 Fast | 2025-09 | — | Proprietary |
FAQ
What is Grok 4.1 Fast?
Grok 4.1 Fast is xAI's agentic, tool-calling LLM released November 19, 2025. It is tuned for high-volume, real-world tasks like customer support and deep research, carries a 2M-token context window, and shipped together with xAI's Agent Tools API.
What is the difference between the reasoning and non-reasoning modes?
It is one model offered in two modes selected by an API parameter. The non-reasoning mode (grok-4-1-fast-non-reasoning) gives instant, low-latency replies; the reasoning mode (grok-4-1-fast-reasoning) emits thinking tokens for multi-step problems. Per-token pricing is the same for both.
How much does Grok 4.1 Fast cost?
Per xAI, it is $0.20 per million input tokens, $0.05 per million cached input tokens, and $0.50 per million output tokens. The bundled Agent Tools API is billed separately at no more than $5 per 1,000 successful tool calls. Both were free through December 3, 2025 at launch.
How good is Grok 4.1 Fast at tool calling?
xAI reports it scores 100% on τ²-bench Telecom and 72% on the Berkeley Function Calling Leaderboard v4, beating models like Claude Sonnet 4.5, GPT-5.1, and Grok 4 on those agentic benchmarks, and it sustains long-context quality (67%) far above Grok 4 (22%).