DeepSeek-R1-0528-Qwen3-8B

Name: DeepSeek-R1-0528-Qwen3-8B
Author: DeepSeek

An 8B reasoning model that distills DeepSeek-R1-0528's chain-of-thought into the Qwen3-8B base.

Overview

DeepSeek-R1-0528-Qwen3-8B is an 8-billion-parameter open-weight reasoning model from DeepSeek, released on 29 May 2025 alongside the larger DeepSeek-R1-0528 update. DeepSeek built it by distilling the chain-of-thought reasoning from R1-0528 into Qwen3-8B Base — in other words, it takes the reasoning traces produced by the big 685B R1-0528 model and uses them to post-train Alibaba's small Qwen3-8B base model. The result is a compact model that thinks step by step before answering.

Its network architecture is identical to Qwen3-8B, so it runs with the same inference setup, but it ships with DeepSeek's own tokenizer configuration rather than Qwen's. DeepSeek recommends a sampling temperature of 0.6, supports system prompts, and (unlike some earlier R1 distills) does not require you to manually prepend a thinking token to trigger reasoning. The whole model is released under the permissive MIT License, which allows commercial use and further distillation.

DeepSeek positions DeepSeek-R1-0528-Qwen3-8B as a way to bring R1-grade reasoning to commodity hardware. On its own model card, DeepSeek reports that it reaches state-of-the-art results among open-source models on AIME 2024 — beating the original Qwen3-8B by 10 points and roughly matching the much larger Qwen3-235B-thinking on that math test.

Released	2025-05-29
License	MIT
Weights	Open weights
Parameters	8B
Context	131K
Architecture	Dense transformer (Qwen3-8B base)
Modalities	Text
Status	Generally available

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Strengths

State-of-the-art math reasoning for an 8B open model — 86.0 on AIME 2024 per DeepSeek's model card, +10 points over Qwen3-8B.
Small enough to run locally on a single consumer GPU (or quantized on a laptop), bringing R1-style chain-of-thought to commodity hardware.
Fully open under the MIT License, with weights on Hugging Face, so it can be self-hosted, fine-tuned, and distilled commercially.
Drop-in Qwen3-8B inference compatibility (same architecture), with system-prompt support and no manual think-token needed.

Best for

Local and on-device reasoning where a hosted frontier model is impractical or privacy-sensitive.
Math and competition-style problem solving (AIME / HMMT) on a small footprint.
Cost-controlled coding and logic tasks via self-hosting under an open license.
A base for further fine-tuning or distillation experiments under MIT.

How to access

Provider	Model ID
OpenRouter ↗	`deepseek/deepseek-r1-0528-qwen3-8b`

DeepSeek R1 Distill — every version

The full lineage of the DeepSeek R1 Distill line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
DeepSeek-R1-0528-Qwen3-8Bcurrent	2025-05-29	131K	MIT
DeepSeek-R1-Distill-Llama-70B	2025-01-20	—	Open weights
DeepSeek-R1-Distill-Qwen-32B	2025-01-20	—	Open weights
DeepSeek-R1-Distill-Qwen-14B	2025-01-20	—	Open weights
DeepSeek-R1-Distill-Llama-8B	2025-01-20	—	Open weights
DeepSeek-R1-Distill-Qwen-7B	2025-01-20	—	Open weights
DeepSeek-R1-Distill-Qwen-1.5B	2025-01-20	—	Open weights

FAQ

What is DeepSeek-R1-0528-Qwen3-8B?

It is an 8-billion-parameter open-weight reasoning model that DeepSeek created by distilling the chain-of-thought from its larger DeepSeek-R1-0528 model into Alibaba's Qwen3-8B Base. It thinks step by step before answering and is released under the MIT License.

How is it different from the original DeepSeek-R1-Distill-Qwen models?

The earlier January 2025 distills were built on Qwen 2.5 and Llama bases. This one, released 29 May 2025, distills the updated R1-0528 reasoning traces onto a newer Qwen3-8B base, which DeepSeek reports lifts AIME 2024 to 86.0 — state-of-the-art among open 8B models.

Can I run DeepSeek-R1-0528-Qwen3-8B locally?

Yes. Its architecture is identical to Qwen3-8B, so it runs with standard Qwen3-8B inference tooling, and at 8B parameters it fits on a single consumer GPU (or quantized GGUF builds on a laptop). DeepSeek recommends a temperature of 0.6.

What license does it use?

DeepSeek-R1-0528-Qwen3-8B is released under the MIT License, which permits commercial use and further model distillation.

// Overview

// Benchmarks

// Strengths

// Best for

// How to access

// DeepSeek R1 Distill — every version

// FAQ