Overview
Grok 3 is the flagship large language model xAI released on February 17, 2025, succeeding Grok 2. xAI says Grok 3 was trained with roughly 10x the compute of Grok 2 on its Colossus supercomputer (reported to scale to around 200,000 NVIDIA H100 GPUs), and shipped alongside a smaller, faster Grok 3 mini.
The headline change in Grok 3 was explicit reasoning: a Think mode lets the model spend extra test-time compute to work through problems step by step, with the reasoning trace visible to the user. xAI also debuted DeepSearch, an agentic research mode that queries the web and X in real time to assemble summaries. The Grok 3 generation is text-only as an API model — native image input arrived later with Grok 4.
As a developer model, grok-3 (and the cheaper grok-3-mini) was served through the xAI API with a 131K-token context window. It was retired from the API on May 15, 2026, after which requests to the grok-3 slug redirect to grok-4.3 and bill at grok-4.3 pricing.
| Released | 2025-02-17 |
|---|---|
| License | Proprietary |
| Weights | API only |
| Context | 131K tokens (xAI API). xAI marketed a 1M-token capability for the model. |
| Architecture | Transformer-based large language model. xAI says Grok 3 was trained with roughly 10x more compute than Grok 2 on its Colossus supercomputer (~200,000 NVIDIA H100 GPUs). A separate Think (reasoning) mode applies extra test-time compute. xAI has not published a parameter count. |
| Knowledge cutoff | November 2024 |
| Modalities | Text |
| Status | Retired from the xAI API on May 15, 2026 — requests to the grok-3 slug now redirect to (and bill as) grok-4.3. |
Benchmarks
- AIME 2025 (math, Think mode)93.3%
- GPQA Diamond (graduate science, Think mode)84.6%
- LiveCodeBench (code, Think mode)79.4%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $3.00 / 1M tokens per 1M tokens |
|---|---|
| Cached input | $0.75 / 1M tokens per 1M tokens |
| Output | $15.00 / 1M tokens per 1M tokens |
grok-3 standard tier. A faster grok-3-fast tier was priced at $5 input / $25 output, and grok-3-mini at $0.30 input / $0.50 output. After the May 15, 2026 retirement, grok-3 requests bill at grok-4.3 rates ($1.25 / $2.50).
Strengths
- Strong reasoning in Think mode — 93.3% on AIME 2025, 84.6% on GPQA Diamond and 79.4% on LiveCodeBench (per xAI, with test-time compute)
- DeepSearch agentic web/X research baked into the Grok product
- Visible chain-of-thought: the Think trace is exposed, not hidden
- A cheaper, faster Grok 3 mini tier for cost-sensitive workloads
Best for
- Math and competition-style problem solving with Think mode
- Graduate-level science and technical Q&A
- Code generation and debugging
- Real-time research and summarization over the web and X via DeepSearch
- General-purpose chat assistance in the Grok app
How to access
| Provider | Model ID |
|---|---|
| xAI ↗ | grok-3 |
Grok (flagship) — every version
The full lineage of the Grok (flagship) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
FAQ
When was Grok 3 released?
xAI released Grok 3, its flagship model, on February 17, 2025, alongside a smaller Grok 3 mini.
Is Grok 3 still available?
Not as a standalone API model. xAI retired grok-3 on May 15, 2026; requests to the grok-3 slug now redirect to grok-4.3 and bill at grok-4.3 pricing. The grok-3 name continues to resolve, so old code does not break.
What is Grok 3's Think mode?
Think mode is Grok 3's explicit reasoning setting. It spends extra test-time compute to reason through a problem step by step and exposes the reasoning trace. With this mode and additional test-time compute, xAI reported 93.3% on AIME 2025, 84.6% on GPQA Diamond and 79.4% on LiveCodeBench.
Does Grok 3 accept images?
No — the Grok 3 API model is text-only. Native image input came later, with Grok 4 in July 2025.