Overview
Grok 4 is xAI's flagship large language model, released July 9, 2025 as the company's first model positioned as a frontier reasoning system. It is the successor to Grok 3 and the foundation of the later Grok 4.1 and Grok 4.3 releases. Grok 4 was trained with roughly 10x more reinforcement-learning compute than Grok 3, using xAI's Colossus cluster of about 200,000 GPUs, and reinforcement learning was applied at pretraining scale to make tool use a first-class capability rather than an add-on.
What sets Grok 4 apart is native, RL-trained tool use: the model reasons while calling a code interpreter and searching the web and X (formerly Twitter) in real time, then folds the results back into its answer. It accepts text and image input and returns text. xAI shipped it in two forms — standard Grok 4 (a single reasoning agent) and Grok 4 Heavy, a multi-agent tier in which several copies of the model work a problem in parallel and reconcile their answers for higher accuracy on the hardest tasks.
At launch Grok 4 posted frontier scores across reasoning, math, and science benchmarks, and Grok 4 Heavy was the first system to clear 50% on Humanity's Last Exam (text-only subset). The standard model is served via the xAI API as grok-4-0709 with a 256K-token context window (128K in the Grok consumer app). It has since been superseded by cheaper, faster Grok 4.1 and 4.3 releases but remains accessible as a legacy model.
| Released | 2025-07-09 |
|---|---|
| License | Proprietary |
| Weights | API only |
| Context | 256K |
| Max output | 8K |
| Architecture | Built on the sixth generation of xAI's foundation model and trained with roughly 10x more reinforcement-learning compute than Grok 3 on the Colossus 200,000-GPU cluster. xAI ran RL at pretraining scale to teach Grok 4 to natively use tools (code execution and web/X search) while reasoning. The companion Grok 4 Heavy tier runs multiple agents in parallel on the same model and compares their work to reach an answer. Exact parameter count is not disclosed by xAI. |
| Knowledge cutoff | November 2024 |
| Modalities | Text, Vision |
| Status | Available (legacy; superseded by Grok 4.1/4.3) |
Benchmarks
- Humanity's Last Exam (with tools)41%
- GPQA87.5%
- AIME 202591.7%
- HMMT 202590%
- LiveCodeBench (Jan–May)79%
- USAMO 202537.5%
- ARC-AGI-215.9%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $3.00 / 1M tokens per 1M tokens |
|---|---|
| Output | $15.00 / 1M tokens per 1M tokens |
xAI list price for the grok-4-0709 model. The standard Grok 4 tier; the Grok 4 Heavy multi-agent tier is offered through the SuperGrok Heavy consumer plan ($300/month).
Strengths
- Frontier reasoning, math, and graduate-level science performance at launch (GPQA 87.5%, AIME 2025 91.7%, HMMT 2025 90.0%)
- Native, reinforcement-learning-trained tool use — code execution plus real-time web and X search woven into its reasoning
- Strong abstract-reasoning generalization: first model to break the single-digit wall on ARC-AGI-2 (15.9%, independently verified by the ARC Prize Foundation)
- Grok 4 Heavy multi-agent tier pushes the hardest benchmarks higher (50.7% on Humanity's Last Exam, 100% on AIME 2025)
- Large 256K-token API context window for long documents and extended agentic sessions
- Vision (image) input alongside text
Best for
- Hard STEM and competition-math problem solving (AIME/HMMT/USAMO-style reasoning)
- Graduate-level science Q&A and research assistance
- Agentic workflows that need a model to plan, run code, and search the web/X mid-reasoning
- Coding and code review aided by tool use and a large context window
- Long-document analysis using the 256K-token context window
- Research-grade tasks where Grok 4 Heavy's parallel multi-agent accuracy is worth the higher cost
How to access
| Provider | Model ID |
|---|---|
| xAI ↗ | grok-4-0709 |
Grok (flagship) — every version
The full lineage of the Grok (flagship) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
FAQ
When was Grok 4 released?
xAI released Grok 4 on July 9, 2025, announcing it in a livestream alongside the Grok 4 Heavy multi-agent tier. The API model id is grok-4-0709.
What is the difference between Grok 4 and Grok 4 Heavy?
Standard Grok 4 is a single reasoning agent. Grok 4 Heavy runs several copies of the model in parallel on the same problem and reconciles their answers, which raises accuracy on the hardest benchmarks — for example it reached 50.7% on Humanity's Last Exam and 100% on AIME 2025, versus 41.0% and 91.7% for standard Grok 4. Heavy is offered through xAI's SuperGrok Heavy plan.
How much does Grok 4 cost via the API?
xAI lists the standard Grok 4 model (grok-4-0709) at $3.00 per million input tokens and $15.00 per million output tokens. Newer Grok 4.1 and 4.3 releases are cheaper, so Grok 4 is now a legacy option.
What is Grok 4's context window and knowledge cutoff?
Grok 4 has a 256K-token context window in the API (128K in the Grok consumer app) and a maximum output of about 8K tokens. xAI's documentation lists a knowledge cutoff of November 2024. It accepts text and image input and returns text.