AI/TLDR

Grok 4.1 Fast

xAI's agentic tool-calling model: 2M-token context, a low-latency non-reasoning mode and a deeper reasoning mode, shipped with the Agent Tools API.

Overview

Grok 4.1 Fast is a large language model from xAI, part of the Grok Fast line and released on November 19, 2025 alongside the Agent Tools API. It is positioned as xAI's best tool-calling and agentic model — built for high-volume, real-world tasks like customer support and deep research rather than as the absolute top "brain" (that role belongs to Grok 4 / Grok 4 Heavy). It is tuned from the Grok 4.1 base, which itself emphasized lower hallucination and stronger conversational quality.

The model carries a 2-million-token context window and accepts both text and images, returning text. It is offered as a single model in two modes: a low-latency non-reasoning mode (model ID grok-4-1-fast-non-reasoning) for instant replies, and a reasoning mode (grok-4-1-fast-reasoning) that emits thinking tokens for multi-step problems. The mode is selected through a reasoning parameter in the API, and pricing is the same either way.

xAI trained Grok 4.1 Fast with heavy reinforcement learning across many simulated tool-use environments, which is reflected in its agentic benchmark results. At launch xAI made the model and the bundled Agent Tools API free for a two-week window (through December 3, 2025) via partners such as OpenRouter, before settling into low per-token pricing.

Released2025-11-19
LicenseProprietary
WeightsAPI only
Context2M
ArchitectureProprietary transformer trained with large-scale reinforcement learning on tool use across many simulated environments, tuned from the Grok 4.1 base. Exposes one model in two modes — a low-latency non-reasoning mode and a reasoning mode that emits thinking tokens — toggled by a reasoning parameter rather than separate weights.
ModalitiesText, Vision
StatusAvailable

Benchmarks

Agentic search benchmark comparison from xAI's Grok 4.1 Fast launch post: Grok 4.1 Fast (with Agent Tools API) vs GPT-5, Claude Sonnet 4.5, and Gemini 3 Pro on Reka Research-Eval, FRAMES, and X Browse (Score and Avg. Cost per query).

BenchmarkGrok 4.1 FastGPT-5Claude Sonnet 4.5Gemini 3 Pro
Reka Research-Eval (Score)63.9%45.5%41.2%55.9%
Reka Research-Eval (Avg. Cost)0.046 $0.107 $0.065 $
FRAMES (Score)87.6%86%85%90.9%
FRAMES (Avg. Cost)0.048 $0.058 $0.078 $
X Browse (Score)56.3%24.2%14.6%26.5%
X Browse (Avg. Cost)0.091 $0.198 $0.126 $

Comparison source ↗

This model's scores

  1. τ²-bench (Telecom) — agentic tool use100%
  2. Berkeley Function Calling Leaderboard v4 (BFCL-V4)72%
  3. Long-context (2M window) retrieval67%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.20 / 1M tokens per 1M tokens
Cached input$0.05 / 1M tokens per 1M tokens
Output$0.50 / 1M tokens per 1M tokens

Same price for reasoning and non-reasoning modes. The bundled Agent Tools API is billed separately at no more than $5 per 1,000 successful tool calls. Model and Agent Tools API were free through Dec 3, 2025 at launch.

Pricing source ↗

Strengths

  • Frontier-level tool calling and agentic workflows — top results on τ²-bench Telecom and Berkeley Function Calling v4
  • 2M-token context window that holds up on long-context retrieval far better than Grok 4
  • Two modes from one model: pick instant non-reasoning replies or deeper reasoning via a single API parameter
  • Roughly half the hallucination rate of the earlier Grok 4 Fast, at similar task accuracy
  • Very low, predictable pricing ($0.20/$0.50 per million tokens) with cheap cached input for high-volume agents
  • Ships with a bundled Agent Tools API (web/code/file tools) capped at no more than $5 per 1,000 successful tool calls

Best for

  • Customer-support and helpdesk agents that chain many tool calls
  • Deep-research assistants that fan out web search and synthesize long contexts
  • Multi-step agentic automation and orchestration over external tools/APIs
  • High-volume API workloads where latency and per-token cost matter
  • Long-document analysis and retrieval over codebases or large corpora (up to 2M tokens)
  • Vision-assisted agent steps that read screenshots, charts, or scanned pages (JPG/PNG)

How to access

ProviderModel ID
xAI API ↗grok-4-1-fast-reasoning / grok-4-1-fast-non-reasoning
OpenRouter ↗x-ai/grok-4.1-fast
Oracle Cloud (OCI Generative AI) ↗xai.grok-4-1-fast-reasoning / xai.grok-4-1-fast-non-reasoning

Grok Fast — every version

The full lineage of the Grok Fast line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Grok 4.1 Fastcurrent2025-11-19Proprietary
Grok 4 Fast2025-09Proprietary

FAQ

What is Grok 4.1 Fast?

Grok 4.1 Fast is xAI's agentic, tool-calling LLM released November 19, 2025. It is tuned for high-volume, real-world tasks like customer support and deep research, carries a 2M-token context window, and shipped together with xAI's Agent Tools API.

What is the difference between the reasoning and non-reasoning modes?

It is one model offered in two modes selected by an API parameter. The non-reasoning mode (grok-4-1-fast-non-reasoning) gives instant, low-latency replies; the reasoning mode (grok-4-1-fast-reasoning) emits thinking tokens for multi-step problems. Per-token pricing is the same for both.

How much does Grok 4.1 Fast cost?

Per xAI, it is $0.20 per million input tokens, $0.05 per million cached input tokens, and $0.50 per million output tokens. The bundled Agent Tools API is billed separately at no more than $5 per 1,000 successful tool calls. Both were free through December 3, 2025 at launch.

How good is Grok 4.1 Fast at tool calling?

xAI reports it scores 100% on τ²-bench Telecom and 72% on the Berkeley Function Calling Leaderboard v4, beating models like Claude Sonnet 4.5, GPT-5.1, and Grok 4 on those agentic benchmarks, and it sustains long-context quality (67%) far above Grok 4 (22%).