Overview
DeepSeek-R1-0528-Qwen3-8B is an 8-billion-parameter open-weight reasoning model from DeepSeek, released on 29 May 2025 alongside the larger DeepSeek-R1-0528 update. DeepSeek built it by distilling the chain-of-thought reasoning from R1-0528 into Qwen3-8B Base — in other words, it takes the reasoning traces produced by the big 685B R1-0528 model and uses them to post-train Alibaba's small Qwen3-8B base model. The result is a compact model that thinks step by step before answering.
Its network architecture is identical to Qwen3-8B, so it runs with the same inference setup, but it ships with DeepSeek's own tokenizer configuration rather than Qwen's. DeepSeek recommends a sampling temperature of 0.6, supports system prompts, and (unlike some earlier R1 distills) does not require you to manually prepend a thinking token to trigger reasoning. The whole model is released under the permissive MIT License, which allows commercial use and further distillation.
DeepSeek positions DeepSeek-R1-0528-Qwen3-8B as a way to bring R1-grade reasoning to commodity hardware. On its own model card, DeepSeek reports that it reaches state-of-the-art results among open-source models on AIME 2024 — beating the original Qwen3-8B by 10 points and roughly matching the much larger Qwen3-235B-thinking on that math test.
| Released | 2025-05-29 |
|---|---|
| License | MIT |
| Weights | Open weights |
| Parameters | 8B |
| Context | 131K |
| Architecture | Dense transformer (Qwen3-8B base) |
| Modalities | Text |
| Status | Generally available |
Benchmarks
- AIME 202486%
- AIME 202576.3%
- HMMT Feb 202561.5%
- GPQA Diamond61.1%
- LiveCodeBench (2408-2505)60.5%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Strengths
- State-of-the-art math reasoning for an 8B open model — 86.0 on AIME 2024 per DeepSeek's model card, +10 points over Qwen3-8B.
- Small enough to run locally on a single consumer GPU (or quantized on a laptop), bringing R1-style chain-of-thought to commodity hardware.
- Fully open under the MIT License, with weights on Hugging Face, so it can be self-hosted, fine-tuned, and distilled commercially.
- Drop-in Qwen3-8B inference compatibility (same architecture), with system-prompt support and no manual think-token needed.
Best for
- Local and on-device reasoning where a hosted frontier model is impractical or privacy-sensitive.
- Math and competition-style problem solving (AIME / HMMT) on a small footprint.
- Cost-controlled coding and logic tasks via self-hosting under an open license.
- A base for further fine-tuning or distillation experiments under MIT.
How to access
| Provider | Model ID |
|---|---|
| OpenRouter ↗ | deepseek/deepseek-r1-0528-qwen3-8b |
DeepSeek R1 Distill — every version
The full lineage of the DeepSeek R1 Distill line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| DeepSeek-R1-0528-Qwen3-8Bcurrent | 2025-05-29 | 131K | MIT |
| DeepSeek-R1-Distill-Llama-70B | 2025-01-20 | — | Open weights |
| DeepSeek-R1-Distill-Qwen-32B | 2025-01-20 | — | Open weights |
| DeepSeek-R1-Distill-Qwen-14B | 2025-01-20 | — | Open weights |
| DeepSeek-R1-Distill-Llama-8B | 2025-01-20 | — | Open weights |
| DeepSeek-R1-Distill-Qwen-7B | 2025-01-20 | — | Open weights |
| DeepSeek-R1-Distill-Qwen-1.5B | 2025-01-20 | — | Open weights |
FAQ
What is DeepSeek-R1-0528-Qwen3-8B?
It is an 8-billion-parameter open-weight reasoning model that DeepSeek created by distilling the chain-of-thought from its larger DeepSeek-R1-0528 model into Alibaba's Qwen3-8B Base. It thinks step by step before answering and is released under the MIT License.
How is it different from the original DeepSeek-R1-Distill-Qwen models?
The earlier January 2025 distills were built on Qwen 2.5 and Llama bases. This one, released 29 May 2025, distills the updated R1-0528 reasoning traces onto a newer Qwen3-8B base, which DeepSeek reports lifts AIME 2024 to 86.0 — state-of-the-art among open 8B models.
Can I run DeepSeek-R1-0528-Qwen3-8B locally?
Yes. Its architecture is identical to Qwen3-8B, so it runs with standard Qwen3-8B inference tooling, and at 8B parameters it fits on a single consumer GPU (or quantized GGUF builds on a laptop). DeepSeek recommends a temperature of 0.6.
What license does it use?
DeepSeek-R1-0528-Qwen3-8B is released under the MIT License, which permits commercial use and further model distillation.