Overview
DeepSeek-R1-0528 is the May 28, 2025 update to DeepSeek's first-generation reasoning line, DeepSeek R1. Like the original R1, it is a Mixture-of-Experts model built on the DeepSeek-V3 base — 671 billion total parameters with roughly 37 billion active per token (the Hugging Face card lists 685B, which includes the multi-token-prediction module). It is released as open weights under the permissive MIT license, which allows commercial use and distillation.
The 0528 revision is a same-architecture refresh rather than a new model: DeepSeek added post-training compute and algorithmic optimizations that deepen the model's reasoning. Average reasoning length per question roughly doubled (about 12K to 23K tokens on AIME), and benchmark scores jumped accordingly — AIME 2025 rose from 70.0 to 87.5, AIME 2024 from 79.8 to 91.4, and LiveCodeBench from 63.5 to 73.3. DeepSeek also reports a reduced hallucination rate, native support for JSON output and function calling, and system-prompt support, with no need to prepend a thinking tag to trigger reasoning.
DeepSeek-R1-0528 has a 163,840-token (about 164K) context window and is text-only — no vision, audio, or PDF input. Alongside the flagship, DeepSeek distilled its chain-of-thought into a small model, DeepSeek-R1-0528-Qwen3-8B (fine-tuned on Qwen3-8B Base), which reaches 86.0 on AIME 2024 — state-of-the-art among open 8B models. The weights are on Hugging Face and the hosted API exposes the model as deepseek-reasoner; it was positioned as an open challenger to OpenAI o3 and Gemini 2.5 Pro.
| Released | 2025-05-28 |
|---|---|
| License | MIT |
| Weights | Open weights |
| Parameters | 671B total / 37B active (685B on Hugging Face incl. MTP module) |
| Context | 164K |
| Max output | 32K tokens (64K max generation length) |
| Architecture | Mixture-of-Experts transformer built on the DeepSeek-V3 base, post-trained with large-scale reinforcement learning to expose chain-of-thought. The 0528 update added extra post-training compute and algorithmic optimizations that roughly doubled per-question reasoning depth (about 12K to 23K tokens on AIME). |
| Knowledge cutoff | Not officially disclosed |
| Modalities | Text |
| Status | Generally available |
Benchmarks
- AIME 2024 (Pass@1)91.4%
- AIME 2025 (Pass@1)87.5%
- HMMT 2025 (Pass@1)79.4%
- CNMO 2024 (Pass@1)86.9%
- GPQA Diamond (Pass@1)81%
- Humanity's Last Exam (Pass@1)17.7%
- LiveCodeBench (Pass@1)73.3%
- Codeforces Rating1930
- SWE Verified (Resolved)57.6%
- Aider-Polyglot71.6%
- MMLU-Pro (EM)85%
- MMLU-Redux (EM)93.4%
- FRAMES (Acc.)83%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.50 / 1M tokens per 1M tokens |
|---|---|
| Output | $2.15 / 1M tokens per 1M tokens |
Standard OpenRouter rate for the open-weight DeepSeek-R1-0528. Other hosts price it differently (Artificial Analysis lists a $1.35 in / $4.20 out provider). DeepSeek's first-party API serves it as deepseek-reasoner; that endpoint's current pricing now reflects a newer model, so the OpenRouter figure is cited here.
Strengths
- Open weights under the permissive MIT license — free for commercial use, self-hosting, and distillation
- Large gains in deep reasoning over the original R1: AIME 2025 87.5 (up from 70.0), AIME 2024 91.4, HMMT 2025 79.4
- Strong competition math and coding — Codeforces rating ~1930, LiveCodeBench 73.3, Aider-Polyglot 71.6
- Reduced hallucination rate versus the original R1, per DeepSeek
- Native JSON output and function-calling support, plus system-prompt support without a manual thinking tag
- Distilled DeepSeek-R1-0528-Qwen3-8B brings R1-grade reasoning to commodity hardware (86.0 on AIME 2024)
Best for
- Competition-style math and multi-step logical reasoning
- Coding and software-engineering tasks (LiveCodeBench, SWE-bench, Aider-style edits)
- Agentic tool use and function-calling workflows that need structured JSON output
- Self-hosted reasoning deployments where an open MIT-licensed model is required
- Distillation: using R1-0528's chain-of-thought to train smaller, cheaper student models
How to access
| Provider | Model ID |
|---|---|
| DeepSeek Platform ↗ | deepseek-reasoner |
| OpenRouter ↗ | deepseek/deepseek-r1-0528 |
DeepSeek R1 — every version
The full lineage of the DeepSeek R1 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| DeepSeek-R1-0528current | 2025-05-28 | — | MIT |
| DeepSeek-R1 | 2025-01-20 | — | MIT |
| DeepSeek-R1-Zero | 2025-01-20 | — | MIT |
FAQ
What changed in DeepSeek-R1-0528 versus the original DeepSeek-R1?
DeepSeek-R1-0528 keeps the same Mixture-of-Experts architecture on the V3 base but adds post-training compute and algorithmic tuning that deepen its reasoning. Per-question reasoning roughly doubled (about 12K to 23K tokens on AIME), and scores rose across the board — AIME 2025 from 70.0 to 87.5, AIME 2024 from 79.8 to 91.4, and LiveCodeBench from 63.5 to 73.3. DeepSeek also reports fewer hallucinations and added JSON output and function-calling support.
Is DeepSeek-R1-0528 open source and free to use?
The weights are released under the MIT license on Hugging Face, so you can download, self-host, fine-tune, distill, and use them commercially for free. DeepSeek also offers a hosted API (where the model is exposed as deepseek-reasoner), and several third parties such as OpenRouter serve it for a per-token fee.
How big is DeepSeek-R1-0528 and what is its context window?
It is a Mixture-of-Experts model with 671 billion total parameters and about 37 billion active per token (the Hugging Face card lists 685B, which includes the multi-token-prediction module). Its context window is 163,840 tokens — roughly 164K — and it is text-only.
What is DeepSeek-R1-0528-Qwen3-8B?
It is a small distilled model released alongside the flagship: DeepSeek used chain-of-thought from DeepSeek-R1-0528 to post-train Qwen3-8B Base. It scores 86.0 on AIME 2024 — state-of-the-art among open-source 8B models and, per DeepSeek, comparable to the much larger Qwen3-235B-thinking. It is also MIT-licensed.