DeepSeek-R1-0528

Name: DeepSeek-R1-0528
Author: DeepSeek

DeepSeek's open-weight R1 upgrade with deeper reasoning, fewer hallucinations, and function calling — AIME 2025 up from 70 to 87.5.

Overview

DeepSeek-R1-0528 is the May 28, 2025 update to DeepSeek's first-generation reasoning line, DeepSeek R1. Like the original R1, it is a Mixture-of-Experts model built on the DeepSeek-V3 base — 671 billion total parameters with roughly 37 billion active per token (the Hugging Face card lists 685B, which includes the multi-token-prediction module). It is released as open weights under the permissive MIT license, which allows commercial use and distillation.

The 0528 revision is a same-architecture refresh rather than a new model: DeepSeek added post-training compute and algorithmic optimizations that deepen the model's reasoning. Average reasoning length per question roughly doubled (about 12K to 23K tokens on AIME), and benchmark scores jumped accordingly — AIME 2025 rose from 70.0 to 87.5, AIME 2024 from 79.8 to 91.4, and LiveCodeBench from 63.5 to 73.3. DeepSeek also reports a reduced hallucination rate, native support for JSON output and function calling, and system-prompt support, with no need to prepend a thinking tag to trigger reasoning.

DeepSeek-R1-0528 has a 163,840-token (about 164K) context window and is text-only — no vision, audio, or PDF input. Alongside the flagship, DeepSeek distilled its chain-of-thought into a small model, DeepSeek-R1-0528-Qwen3-8B (fine-tuned on Qwen3-8B Base), which reaches 86.0 on AIME 2024 — state-of-the-art among open 8B models. The weights are on Hugging Face and the hosted API exposes the model as deepseek-reasoner; it was positioned as an open challenger to OpenAI o3 and Gemini 2.5 Pro.

Released	2025-05-28
License	MIT
Weights	Open weights
Parameters	671B total / 37B active (685B on Hugging Face incl. MTP module)
Context	164K
Max output	32K tokens (64K max generation length)
Architecture	Mixture-of-Experts transformer built on the DeepSeek-V3 base, post-trained with large-scale reinforcement learning to expose chain-of-thought. The 0528 update added extra post-training compute and algorithmic optimizations that roughly doubled per-question reasoning depth (about 12K to 23K tokens on AIME).
Knowledge cutoff	Not officially disclosed
Modalities	Text
Status	Generally available

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.50 / 1M tokens per 1M tokens
Output	$2.15 / 1M tokens per 1M tokens

Standard OpenRouter rate for the open-weight DeepSeek-R1-0528. Other hosts price it differently (Artificial Analysis lists a $1.35 in / $4.20 out provider). DeepSeek's first-party API serves it as deepseek-reasoner; that endpoint's current pricing now reflects a newer model, so the OpenRouter figure is cited here.

Pricing source ↗

Strengths

Open weights under the permissive MIT license — free for commercial use, self-hosting, and distillation
Large gains in deep reasoning over the original R1: AIME 2025 87.5 (up from 70.0), AIME 2024 91.4, HMMT 2025 79.4
Strong competition math and coding — Codeforces rating ~1930, LiveCodeBench 73.3, Aider-Polyglot 71.6
Reduced hallucination rate versus the original R1, per DeepSeek
Native JSON output and function-calling support, plus system-prompt support without a manual thinking tag
Distilled DeepSeek-R1-0528-Qwen3-8B brings R1-grade reasoning to commodity hardware (86.0 on AIME 2024)

Best for

Competition-style math and multi-step logical reasoning
Coding and software-engineering tasks (LiveCodeBench, SWE-bench, Aider-style edits)
Agentic tool use and function-calling workflows that need structured JSON output
Self-hosted reasoning deployments where an open MIT-licensed model is required
Distillation: using R1-0528's chain-of-thought to train smaller, cheaper student models

How to access

Provider	Model ID
DeepSeek Platform ↗	`deepseek-reasoner`
OpenRouter ↗	`deepseek/deepseek-r1-0528`

DeepSeek R1 — every version

The full lineage of the DeepSeek R1 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
DeepSeek-R1-0528current	2025-05-28	—	MIT
DeepSeek-R1	2025-01-20	—	MIT
DeepSeek-R1-Zero	2025-01-20	—	MIT

FAQ

What changed in DeepSeek-R1-0528 versus the original DeepSeek-R1?

DeepSeek-R1-0528 keeps the same Mixture-of-Experts architecture on the V3 base but adds post-training compute and algorithmic tuning that deepen its reasoning. Per-question reasoning roughly doubled (about 12K to 23K tokens on AIME), and scores rose across the board — AIME 2025 from 70.0 to 87.5, AIME 2024 from 79.8 to 91.4, and LiveCodeBench from 63.5 to 73.3. DeepSeek also reports fewer hallucinations and added JSON output and function-calling support.

Is DeepSeek-R1-0528 open source and free to use?

The weights are released under the MIT license on Hugging Face, so you can download, self-host, fine-tune, distill, and use them commercially for free. DeepSeek also offers a hosted API (where the model is exposed as deepseek-reasoner), and several third parties such as OpenRouter serve it for a per-token fee.

How big is DeepSeek-R1-0528 and what is its context window?

It is a Mixture-of-Experts model with 671 billion total parameters and about 37 billion active per token (the Hugging Face card lists 685B, which includes the multi-token-prediction module). Its context window is 163,840 tokens — roughly 164K — and it is text-only.

What is DeepSeek-R1-0528-Qwen3-8B?

It is a small distilled model released alongside the flagship: DeepSeek used chain-of-thought from DeepSeek-R1-0528 to post-train Qwen3-8B Base. It scores 86.0 on AIME 2024 — state-of-the-art among open-source 8B models and, per DeepSeek, comparable to the much larger Qwen3-235B-thinking. It is also MIT-licensed.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// DeepSeek R1 — every version

// FAQ