AI/TLDR

DeepSeek-R1-0528

DeepSeek's open-weight R1 upgrade with deeper reasoning, fewer hallucinations, and function calling — AIME 2025 up from 70 to 87.5.

Overview

DeepSeek-R1-0528 is the May 28, 2025 update to DeepSeek's first-generation reasoning line, DeepSeek R1. Like the original R1, it is a Mixture-of-Experts model built on the DeepSeek-V3 base — 671 billion total parameters with roughly 37 billion active per token (the Hugging Face card lists 685B, which includes the multi-token-prediction module). It is released as open weights under the permissive MIT license, which allows commercial use and distillation.

The 0528 revision is a same-architecture refresh rather than a new model: DeepSeek added post-training compute and algorithmic optimizations that deepen the model's reasoning. Average reasoning length per question roughly doubled (about 12K to 23K tokens on AIME), and benchmark scores jumped accordingly — AIME 2025 rose from 70.0 to 87.5, AIME 2024 from 79.8 to 91.4, and LiveCodeBench from 63.5 to 73.3. DeepSeek also reports a reduced hallucination rate, native support for JSON output and function calling, and system-prompt support, with no need to prepend a thinking tag to trigger reasoning.

DeepSeek-R1-0528 has a 163,840-token (about 164K) context window and is text-only — no vision, audio, or PDF input. Alongside the flagship, DeepSeek distilled its chain-of-thought into a small model, DeepSeek-R1-0528-Qwen3-8B (fine-tuned on Qwen3-8B Base), which reaches 86.0 on AIME 2024 — state-of-the-art among open 8B models. The weights are on Hugging Face and the hosted API exposes the model as deepseek-reasoner; it was positioned as an open challenger to OpenAI o3 and Gemini 2.5 Pro.

Released2025-05-28
LicenseMIT
WeightsOpen weights
Parameters671B total / 37B active (685B on Hugging Face incl. MTP module)
Context164K
Max output32K tokens (64K max generation length)
ArchitectureMixture-of-Experts transformer built on the DeepSeek-V3 base, post-trained with large-scale reinforcement learning to expose chain-of-thought. The 0528 update added extra post-training compute and algorithmic optimizations that roughly doubled per-question reasoning depth (about 12K to 23K tokens on AIME).
Knowledge cutoffNot officially disclosed
ModalitiesText
StatusGenerally available

Benchmarks

  1. AIME 2024 (Pass@1)91.4%
  2. AIME 2025 (Pass@1)87.5%
  3. HMMT 2025 (Pass@1)79.4%
  4. CNMO 2024 (Pass@1)86.9%
  5. GPQA Diamond (Pass@1)81%
  6. Humanity's Last Exam (Pass@1)17.7%
  7. LiveCodeBench (Pass@1)73.3%
  8. Codeforces Rating1930
  9. SWE Verified (Resolved)57.6%
  10. Aider-Polyglot71.6%
  11. MMLU-Pro (EM)85%
  12. MMLU-Redux (EM)93.4%
  13. FRAMES (Acc.)83%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.50 / 1M tokens per 1M tokens
Output$2.15 / 1M tokens per 1M tokens

Standard OpenRouter rate for the open-weight DeepSeek-R1-0528. Other hosts price it differently (Artificial Analysis lists a $1.35 in / $4.20 out provider). DeepSeek's first-party API serves it as deepseek-reasoner; that endpoint's current pricing now reflects a newer model, so the OpenRouter figure is cited here.

Pricing source ↗

Strengths

  • Open weights under the permissive MIT license — free for commercial use, self-hosting, and distillation
  • Large gains in deep reasoning over the original R1: AIME 2025 87.5 (up from 70.0), AIME 2024 91.4, HMMT 2025 79.4
  • Strong competition math and coding — Codeforces rating ~1930, LiveCodeBench 73.3, Aider-Polyglot 71.6
  • Reduced hallucination rate versus the original R1, per DeepSeek
  • Native JSON output and function-calling support, plus system-prompt support without a manual thinking tag
  • Distilled DeepSeek-R1-0528-Qwen3-8B brings R1-grade reasoning to commodity hardware (86.0 on AIME 2024)

Best for

  • Competition-style math and multi-step logical reasoning
  • Coding and software-engineering tasks (LiveCodeBench, SWE-bench, Aider-style edits)
  • Agentic tool use and function-calling workflows that need structured JSON output
  • Self-hosted reasoning deployments where an open MIT-licensed model is required
  • Distillation: using R1-0528's chain-of-thought to train smaller, cheaper student models

How to access

ProviderModel ID
DeepSeek Platform ↗deepseek-reasoner
OpenRouter ↗deepseek/deepseek-r1-0528

DeepSeek R1 — every version

The full lineage of the DeepSeek R1 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
DeepSeek-R1-0528current2025-05-28MIT
DeepSeek-R12025-01-20MIT
DeepSeek-R1-Zero2025-01-20MIT

FAQ

What changed in DeepSeek-R1-0528 versus the original DeepSeek-R1?

DeepSeek-R1-0528 keeps the same Mixture-of-Experts architecture on the V3 base but adds post-training compute and algorithmic tuning that deepen its reasoning. Per-question reasoning roughly doubled (about 12K to 23K tokens on AIME), and scores rose across the board — AIME 2025 from 70.0 to 87.5, AIME 2024 from 79.8 to 91.4, and LiveCodeBench from 63.5 to 73.3. DeepSeek also reports fewer hallucinations and added JSON output and function-calling support.

Is DeepSeek-R1-0528 open source and free to use?

The weights are released under the MIT license on Hugging Face, so you can download, self-host, fine-tune, distill, and use them commercially for free. DeepSeek also offers a hosted API (where the model is exposed as deepseek-reasoner), and several third parties such as OpenRouter serve it for a per-token fee.

How big is DeepSeek-R1-0528 and what is its context window?

It is a Mixture-of-Experts model with 671 billion total parameters and about 37 billion active per token (the Hugging Face card lists 685B, which includes the multi-token-prediction module). Its context window is 163,840 tokens — roughly 164K — and it is text-only.

What is DeepSeek-R1-0528-Qwen3-8B?

It is a small distilled model released alongside the flagship: DeepSeek used chain-of-thought from DeepSeek-R1-0528 to post-train Qwen3-8B Base. It scores 86.0 on AIME 2024 — state-of-the-art among open-source 8B models and, per DeepSeek, comparable to the much larger Qwen3-235B-thinking. It is also MIT-licensed.