AI/TLDR

Grok 4

xAI's frontier reasoning model, RL-trained for native tool use and real-time X search, with a Grok 4 Heavy multi-agent tier.

Overview

Grok 4 is xAI's flagship large language model, released July 9, 2025 as the company's first model positioned as a frontier reasoning system. It is the successor to Grok 3 and the foundation of the later Grok 4.1 and Grok 4.3 releases. Grok 4 was trained with roughly 10x more reinforcement-learning compute than Grok 3, using xAI's Colossus cluster of about 200,000 GPUs, and reinforcement learning was applied at pretraining scale to make tool use a first-class capability rather than an add-on.

What sets Grok 4 apart is native, RL-trained tool use: the model reasons while calling a code interpreter and searching the web and X (formerly Twitter) in real time, then folds the results back into its answer. It accepts text and image input and returns text. xAI shipped it in two forms — standard Grok 4 (a single reasoning agent) and Grok 4 Heavy, a multi-agent tier in which several copies of the model work a problem in parallel and reconcile their answers for higher accuracy on the hardest tasks.

At launch Grok 4 posted frontier scores across reasoning, math, and science benchmarks, and Grok 4 Heavy was the first system to clear 50% on Humanity's Last Exam (text-only subset). The standard model is served via the xAI API as grok-4-0709 with a 256K-token context window (128K in the Grok consumer app). It has since been superseded by cheaper, faster Grok 4.1 and 4.3 releases but remains accessible as a legacy model.

Released2025-07-09
LicenseProprietary
WeightsAPI only
Context256K
Max output8K
ArchitectureBuilt on the sixth generation of xAI's foundation model and trained with roughly 10x more reinforcement-learning compute than Grok 3 on the Colossus 200,000-GPU cluster. xAI ran RL at pretraining scale to teach Grok 4 to natively use tools (code execution and web/X search) while reasoning. The companion Grok 4 Heavy tier runs multiple agents in parallel on the same model and compares their work to reach an answer. Exact parameter count is not disclosed by xAI.
Knowledge cutoffNovember 2024
ModalitiesText, Vision
StatusAvailable (legacy; superseded by Grok 4.1/4.3)

Benchmarks

  1. Humanity's Last Exam (with tools)41%
  2. GPQA87.5%
  3. AIME 202591.7%
  4. HMMT 202590%
  5. LiveCodeBench (Jan–May)79%
  6. USAMO 202537.5%
  7. ARC-AGI-215.9%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$3.00 / 1M tokens per 1M tokens
Output$15.00 / 1M tokens per 1M tokens

xAI list price for the grok-4-0709 model. The standard Grok 4 tier; the Grok 4 Heavy multi-agent tier is offered through the SuperGrok Heavy consumer plan ($300/month).

Pricing source ↗

Strengths

  • Frontier reasoning, math, and graduate-level science performance at launch (GPQA 87.5%, AIME 2025 91.7%, HMMT 2025 90.0%)
  • Native, reinforcement-learning-trained tool use — code execution plus real-time web and X search woven into its reasoning
  • Strong abstract-reasoning generalization: first model to break the single-digit wall on ARC-AGI-2 (15.9%, independently verified by the ARC Prize Foundation)
  • Grok 4 Heavy multi-agent tier pushes the hardest benchmarks higher (50.7% on Humanity's Last Exam, 100% on AIME 2025)
  • Large 256K-token API context window for long documents and extended agentic sessions
  • Vision (image) input alongside text

Best for

  • Hard STEM and competition-math problem solving (AIME/HMMT/USAMO-style reasoning)
  • Graduate-level science Q&A and research assistance
  • Agentic workflows that need a model to plan, run code, and search the web/X mid-reasoning
  • Coding and code review aided by tool use and a large context window
  • Long-document analysis using the 256K-token context window
  • Research-grade tasks where Grok 4 Heavy's parallel multi-agent accuracy is worth the higher cost

How to access

ProviderModel ID
xAI ↗grok-4-0709

Grok (flagship) — every version

The full lineage of the Grok (flagship) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Grok 4.3current2026-04-301MProprietary
Grok 4.202026-03Proprietary
Grok 4.12025-11-17Proprietary
Grok 42025-07-09Proprietary
Grok 32025-02-17Proprietary
Grok 22024-08-20Open weights
Grok 1.52024-05-15Proprietary
Grok 12023-11-03Apache-2.0

FAQ

When was Grok 4 released?

xAI released Grok 4 on July 9, 2025, announcing it in a livestream alongside the Grok 4 Heavy multi-agent tier. The API model id is grok-4-0709.

What is the difference between Grok 4 and Grok 4 Heavy?

Standard Grok 4 is a single reasoning agent. Grok 4 Heavy runs several copies of the model in parallel on the same problem and reconciles their answers, which raises accuracy on the hardest benchmarks — for example it reached 50.7% on Humanity's Last Exam and 100% on AIME 2025, versus 41.0% and 91.7% for standard Grok 4. Heavy is offered through xAI's SuperGrok Heavy plan.

How much does Grok 4 cost via the API?

xAI lists the standard Grok 4 model (grok-4-0709) at $3.00 per million input tokens and $15.00 per million output tokens. Newer Grok 4.1 and 4.3 releases are cheaper, so Grok 4 is now a legacy option.

What is Grok 4's context window and knowledge cutoff?

Grok 4 has a 256K-token context window in the API (128K in the Grok consumer app) and a maximum output of about 8K tokens. xAI's documentation lists a knowledge cutoff of November 2024. It accepts text and image input and returns text.