Overview
DeepSeek-R1 is DeepSeek's first-generation reasoning model, released January 20, 2025. It is a Mixture-of-Experts model built on the DeepSeek-V3-Base — 671 billion total parameters with roughly 37 billion active per token — and it inherits V3's 128K-token context window. DeepSeek released the weights as open weights under the permissive MIT license, which allows commercial use, fine-tuning, and distillation, making R1 the first openly available model to reach reasoning quality DeepSeek described as 'performance on par with OpenAI-o1'.
Unlike a standard chat model, DeepSeek-R1 produces a visible chain-of-thought before its final answer. It was post-trained from V3-Base with a multi-stage pipeline combining reinforcement learning and supervised fine-tuning; the companion model DeepSeek-R1-Zero was trained with pure RL and no SFT cold start, which the team used to show reasoning behavior can emerge from RL alone. The work was published as the paper 'DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning' (arXiv 2501.12948) and later appeared in Nature.
On reasoning benchmarks DeepSeek-R1 scored 79.8 on AIME 2024, 97.3 on MATH-500, 71.5 on GPQA Diamond, and a 2029 Codeforces rating (96.3rd percentile) — figures DeepSeek positioned as comparable to, and on some math and coding tests slightly ahead of, OpenAI's o1. Alongside the flagship, DeepSeek distilled R1's reasoning into six smaller dense models based on Qwen 2.5 and Llama 3 (1.5B to 70B), bringing R1-grade reasoning to commodity hardware. R1's open release and aggressive API pricing (about 90-95% cheaper than o1 at launch) sparked over 500 derivative models on Hugging Face within days.
| Released | 2025-01-20 |
|---|---|
| License | MIT |
| Weights | Open weights |
| Parameters | 671B total / 37B active (Mixture-of-Experts) |
| Context | 128K |
| Max output | 32,768 tokens (max generation length) |
| Architecture | Mixture-of-Experts transformer built on the DeepSeek-V3-Base — 671B total parameters with about 37B active per token. R1 is post-trained from V3-Base with a multi-stage pipeline: a reinforcement-learning cold start, two RL stages that discover and refine reasoning behavior, and two supervised fine-tuning stages, so the model exposes an explicit chain-of-thought before its final answer. Its predecessor R1-Zero was trained with pure RL and no SFT cold start. |
| Knowledge cutoff | Not officially disclosed |
| Modalities | Text |
| Status | Open weights still available on Hugging Face. Superseded on DeepSeek's first-party API — the deepseek-reasoner alias that launched with R1 was later remapped to newer models (V3.2, and is scheduled to point to V4-Flash after 2026-07-24); the original R1 weights are still hosted by third parties such as OpenRouter. |
Benchmarks
- AIME 2024 (Pass@1)79.8%
- MATH-500 (Pass@1)97.3%
- GPQA Diamond (Pass@1)71.5%
- LiveCodeBench (Pass@1-COT)65.9%
- Codeforces Percentile96.3%
- SWE-bench Verified (Resolved)49.2%
- Aider-Polyglot53.3%
- MMLU (Pass@1)90.8%
- MMLU-Pro (EM)84%
- MMLU-Redux (EM)92.9%
- AlpacaEval 2.0 (LC-winrate)87.6%
- Arena-Hard (vs GPT-4-1106)92.3%
- FRAMES (Acc.)82.5%
- IF-Eval (Prompt Strict)83.3%
- SimpleQA (Correct)30.1%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.55 / 1M tokens (cache miss; $0.14 cache hit) per 1M tokens |
|---|---|
| Output | $2.19 / 1M tokens per 1M tokens |
DeepSeek's first-party launch pricing for R1 (deepseek-reasoner), effective at the January 20, 2025 release — roughly 90-95% cheaper than OpenAI o1 at the time. That alias has since been remapped to newer models, so the original R1 is no longer the model served at this endpoint. Third-party hosts still serve the original R1 weights; OpenRouter, for example, lists $0.70 in / $2.50 out per 1M tokens.
Strengths
- Open weights under the permissive MIT license — free for commercial use, self-hosting, fine-tuning, and distillation
- First openly available reasoning model at o1-class quality: AIME 2024 79.8, MATH-500 97.3, GPQA Diamond 71.5
- Strong competition coding — Codeforces rating 2029 (96.3rd percentile), LiveCodeBench 65.9
- Visible chain-of-thought reasoning that effectively self-checks before answering
- Launched roughly 90-95% cheaper per token than OpenAI o1, undercutting closed reasoning models
- Shipped with six distilled dense models (1.5B-70B, Qwen 2.5 / Llama 3) that run on commodity hardware
Best for
- Competition-style math and multi-step logical reasoning
- Coding and software-engineering tasks (LiveCodeBench, SWE-bench Verified, competitive programming)
- Self-hosted reasoning deployments where an open MIT-licensed model is required
- Distillation: using R1's chain-of-thought traces to train smaller, cheaper student models
- STEM question answering and knowledge tasks (GPQA, MMLU-Pro)
- Research into reinforcement-learning-driven reasoning, building on the open paper and weights
How to access
| Provider | Model ID |
|---|---|
| DeepSeek Platform (historical alias) ↗ | deepseek-reasoner |
| OpenRouter ↗ | deepseek/deepseek-r1 |
DeepSeek R1 — every version
The full lineage of the DeepSeek R1 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| DeepSeek-R1-0528current | 2025-05-28 | — | MIT |
| DeepSeek-R1 | 2025-01-20 | — | MIT |
| DeepSeek-R1-Zero | 2025-01-20 | — | MIT |
FAQ
What is DeepSeek-R1 and when was it released?
DeepSeek-R1 is DeepSeek's first-generation reasoning model, released on January 20, 2025. It is a Mixture-of-Experts model with 671 billion total parameters (about 37 billion active per token) built on the DeepSeek-V3-Base, and it produces a visible chain-of-thought before its final answer. DeepSeek described its quality as on par with OpenAI o1, and it was the first openly available model to reach that level of reasoning.
Is DeepSeek-R1 open source and free to use?
The weights are released under the MIT license on Hugging Face, so you can download, self-host, fine-tune, distill, and use them commercially for free. At launch DeepSeek also served it via a hosted API as deepseek-reasoner; that alias has since been remapped to newer models, but third parties such as OpenRouter still serve the original R1 weights for a per-token fee.
How does DeepSeek-R1 compare to OpenAI o1?
DeepSeek positioned R1 as 'performance on par with OpenAI-o1,' and on some tests it edged ahead: R1 scored 79.8 on AIME 2024 (vs o1's 79.2) and 97.3 on MATH-500 (vs 96.4), and DeepSeek also claimed wins on SWE-bench Verified. At launch R1's API was roughly 90-95% cheaper per token than o1.
What are the DeepSeek-R1 distilled models?
Alongside the 671B flagship, DeepSeek released six smaller dense (non-MoE) models distilled from R1's reasoning traces, based on Alibaba's Qwen 2.5 and Meta's Llama 3 families and ranging from 1.5B to 70B parameters. They bring R1-style reasoning to commodity hardware, with DeepSeek noting its 32B and 70B distills were on par with OpenAI o1-mini. All are MIT-licensed.