AI/TLDR

DeepSeek-R1

DeepSeek's first open-weight reasoning model — performance on par with OpenAI o1 on math, code, and STEM, under a permissive MIT license.

Overview

DeepSeek-R1 is DeepSeek's first-generation reasoning model, released January 20, 2025. It is a Mixture-of-Experts model built on the DeepSeek-V3-Base — 671 billion total parameters with roughly 37 billion active per token — and it inherits V3's 128K-token context window. DeepSeek released the weights as open weights under the permissive MIT license, which allows commercial use, fine-tuning, and distillation, making R1 the first openly available model to reach reasoning quality DeepSeek described as 'performance on par with OpenAI-o1'.

Unlike a standard chat model, DeepSeek-R1 produces a visible chain-of-thought before its final answer. It was post-trained from V3-Base with a multi-stage pipeline combining reinforcement learning and supervised fine-tuning; the companion model DeepSeek-R1-Zero was trained with pure RL and no SFT cold start, which the team used to show reasoning behavior can emerge from RL alone. The work was published as the paper 'DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning' (arXiv 2501.12948) and later appeared in Nature.

On reasoning benchmarks DeepSeek-R1 scored 79.8 on AIME 2024, 97.3 on MATH-500, 71.5 on GPQA Diamond, and a 2029 Codeforces rating (96.3rd percentile) — figures DeepSeek positioned as comparable to, and on some math and coding tests slightly ahead of, OpenAI's o1. Alongside the flagship, DeepSeek distilled R1's reasoning into six smaller dense models based on Qwen 2.5 and Llama 3 (1.5B to 70B), bringing R1-grade reasoning to commodity hardware. R1's open release and aggressive API pricing (about 90-95% cheaper than o1 at launch) sparked over 500 derivative models on Hugging Face within days.

Released2025-01-20
LicenseMIT
WeightsOpen weights
Parameters671B total / 37B active (Mixture-of-Experts)
Context128K
Max output32,768 tokens (max generation length)
ArchitectureMixture-of-Experts transformer built on the DeepSeek-V3-Base — 671B total parameters with about 37B active per token. R1 is post-trained from V3-Base with a multi-stage pipeline: a reinforcement-learning cold start, two RL stages that discover and refine reasoning behavior, and two supervised fine-tuning stages, so the model exposes an explicit chain-of-thought before its final answer. Its predecessor R1-Zero was trained with pure RL and no SFT cold start.
Knowledge cutoffNot officially disclosed
ModalitiesText
StatusOpen weights still available on Hugging Face. Superseded on DeepSeek's first-party API — the deepseek-reasoner alias that launched with R1 was later remapped to newer models (V3.2, and is scheduled to point to V4-Flash after 2026-07-24); the original R1 weights are still hosted by third parties such as OpenRouter.

Benchmarks

  1. AIME 2024 (Pass@1)79.8%
  2. MATH-500 (Pass@1)97.3%
  3. GPQA Diamond (Pass@1)71.5%
  4. LiveCodeBench (Pass@1-COT)65.9%
  5. Codeforces Percentile96.3%
  6. SWE-bench Verified (Resolved)49.2%
  7. Aider-Polyglot53.3%
  8. MMLU (Pass@1)90.8%
  9. MMLU-Pro (EM)84%
  10. MMLU-Redux (EM)92.9%
  11. AlpacaEval 2.0 (LC-winrate)87.6%
  12. Arena-Hard (vs GPT-4-1106)92.3%
  13. FRAMES (Acc.)82.5%
  14. IF-Eval (Prompt Strict)83.3%
  15. SimpleQA (Correct)30.1%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.55 / 1M tokens (cache miss; $0.14 cache hit) per 1M tokens
Output$2.19 / 1M tokens per 1M tokens

DeepSeek's first-party launch pricing for R1 (deepseek-reasoner), effective at the January 20, 2025 release — roughly 90-95% cheaper than OpenAI o1 at the time. That alias has since been remapped to newer models, so the original R1 is no longer the model served at this endpoint. Third-party hosts still serve the original R1 weights; OpenRouter, for example, lists $0.70 in / $2.50 out per 1M tokens.

Pricing source ↗

Strengths

  • Open weights under the permissive MIT license — free for commercial use, self-hosting, fine-tuning, and distillation
  • First openly available reasoning model at o1-class quality: AIME 2024 79.8, MATH-500 97.3, GPQA Diamond 71.5
  • Strong competition coding — Codeforces rating 2029 (96.3rd percentile), LiveCodeBench 65.9
  • Visible chain-of-thought reasoning that effectively self-checks before answering
  • Launched roughly 90-95% cheaper per token than OpenAI o1, undercutting closed reasoning models
  • Shipped with six distilled dense models (1.5B-70B, Qwen 2.5 / Llama 3) that run on commodity hardware

Best for

  • Competition-style math and multi-step logical reasoning
  • Coding and software-engineering tasks (LiveCodeBench, SWE-bench Verified, competitive programming)
  • Self-hosted reasoning deployments where an open MIT-licensed model is required
  • Distillation: using R1's chain-of-thought traces to train smaller, cheaper student models
  • STEM question answering and knowledge tasks (GPQA, MMLU-Pro)
  • Research into reinforcement-learning-driven reasoning, building on the open paper and weights

How to access

ProviderModel ID
DeepSeek Platform (historical alias) ↗deepseek-reasoner
OpenRouter ↗deepseek/deepseek-r1

DeepSeek R1 — every version

The full lineage of the DeepSeek R1 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
DeepSeek-R1-0528current2025-05-28MIT
DeepSeek-R12025-01-20MIT
DeepSeek-R1-Zero2025-01-20MIT

FAQ

What is DeepSeek-R1 and when was it released?

DeepSeek-R1 is DeepSeek's first-generation reasoning model, released on January 20, 2025. It is a Mixture-of-Experts model with 671 billion total parameters (about 37 billion active per token) built on the DeepSeek-V3-Base, and it produces a visible chain-of-thought before its final answer. DeepSeek described its quality as on par with OpenAI o1, and it was the first openly available model to reach that level of reasoning.

Is DeepSeek-R1 open source and free to use?

The weights are released under the MIT license on Hugging Face, so you can download, self-host, fine-tune, distill, and use them commercially for free. At launch DeepSeek also served it via a hosted API as deepseek-reasoner; that alias has since been remapped to newer models, but third parties such as OpenRouter still serve the original R1 weights for a per-token fee.

How does DeepSeek-R1 compare to OpenAI o1?

DeepSeek positioned R1 as 'performance on par with OpenAI-o1,' and on some tests it edged ahead: R1 scored 79.8 on AIME 2024 (vs o1's 79.2) and 97.3 on MATH-500 (vs 96.4), and DeepSeek also claimed wins on SWE-bench Verified. At launch R1's API was roughly 90-95% cheaper per token than o1.

What are the DeepSeek-R1 distilled models?

Alongside the 671B flagship, DeepSeek released six smaller dense (non-MoE) models distilled from R1's reasoning traces, based on Alibaba's Qwen 2.5 and Meta's Llama 3 families and ranging from 1.5B to 70B parameters. They bring R1-style reasoning to commodity hardware, with DeepSeek noting its 32B and 70B distills were on par with OpenAI o1-mini. All are MIT-licensed.