AI/TLDR

DeepSeek-R1-0528-Qwen3-8B

An 8B reasoning model that distills DeepSeek-R1-0528's chain-of-thought into the Qwen3-8B base.

Overview

DeepSeek-R1-0528-Qwen3-8B is an 8-billion-parameter open-weight reasoning model from DeepSeek, released on 29 May 2025 alongside the larger DeepSeek-R1-0528 update. DeepSeek built it by distilling the chain-of-thought reasoning from R1-0528 into Qwen3-8B Base — in other words, it takes the reasoning traces produced by the big 685B R1-0528 model and uses them to post-train Alibaba's small Qwen3-8B base model. The result is a compact model that thinks step by step before answering.

Its network architecture is identical to Qwen3-8B, so it runs with the same inference setup, but it ships with DeepSeek's own tokenizer configuration rather than Qwen's. DeepSeek recommends a sampling temperature of 0.6, supports system prompts, and (unlike some earlier R1 distills) does not require you to manually prepend a thinking token to trigger reasoning. The whole model is released under the permissive MIT License, which allows commercial use and further distillation.

DeepSeek positions DeepSeek-R1-0528-Qwen3-8B as a way to bring R1-grade reasoning to commodity hardware. On its own model card, DeepSeek reports that it reaches state-of-the-art results among open-source models on AIME 2024 — beating the original Qwen3-8B by 10 points and roughly matching the much larger Qwen3-235B-thinking on that math test.

Released2025-05-29
LicenseMIT
WeightsOpen weights
Parameters8B
Context131K
ArchitectureDense transformer (Qwen3-8B base)
ModalitiesText
StatusGenerally available

Benchmarks

  1. AIME 202486%
  2. AIME 202576.3%
  3. HMMT Feb 202561.5%
  4. GPQA Diamond61.1%
  5. LiveCodeBench (2408-2505)60.5%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Strengths

  • State-of-the-art math reasoning for an 8B open model — 86.0 on AIME 2024 per DeepSeek's model card, +10 points over Qwen3-8B.
  • Small enough to run locally on a single consumer GPU (or quantized on a laptop), bringing R1-style chain-of-thought to commodity hardware.
  • Fully open under the MIT License, with weights on Hugging Face, so it can be self-hosted, fine-tuned, and distilled commercially.
  • Drop-in Qwen3-8B inference compatibility (same architecture), with system-prompt support and no manual think-token needed.

Best for

  • Local and on-device reasoning where a hosted frontier model is impractical or privacy-sensitive.
  • Math and competition-style problem solving (AIME / HMMT) on a small footprint.
  • Cost-controlled coding and logic tasks via self-hosting under an open license.
  • A base for further fine-tuning or distillation experiments under MIT.

How to access

ProviderModel ID
OpenRouter ↗deepseek/deepseek-r1-0528-qwen3-8b

DeepSeek R1 Distill — every version

The full lineage of the DeepSeek R1 Distill line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
DeepSeek-R1-0528-Qwen3-8Bcurrent2025-05-29131KMIT
DeepSeek-R1-Distill-Llama-70B2025-01-20Open weights
DeepSeek-R1-Distill-Qwen-32B2025-01-20Open weights
DeepSeek-R1-Distill-Qwen-14B2025-01-20Open weights
DeepSeek-R1-Distill-Llama-8B2025-01-20Open weights
DeepSeek-R1-Distill-Qwen-7B2025-01-20Open weights
DeepSeek-R1-Distill-Qwen-1.5B2025-01-20Open weights

FAQ

What is DeepSeek-R1-0528-Qwen3-8B?

It is an 8-billion-parameter open-weight reasoning model that DeepSeek created by distilling the chain-of-thought from its larger DeepSeek-R1-0528 model into Alibaba's Qwen3-8B Base. It thinks step by step before answering and is released under the MIT License.

How is it different from the original DeepSeek-R1-Distill-Qwen models?

The earlier January 2025 distills were built on Qwen 2.5 and Llama bases. This one, released 29 May 2025, distills the updated R1-0528 reasoning traces onto a newer Qwen3-8B base, which DeepSeek reports lifts AIME 2024 to 86.0 — state-of-the-art among open 8B models.

Can I run DeepSeek-R1-0528-Qwen3-8B locally?

Yes. Its architecture is identical to Qwen3-8B, so it runs with standard Qwen3-8B inference tooling, and at 8B parameters it fits on a single consumer GPU (or quantized GGUF builds on a laptop). DeepSeek recommends a temperature of 0.6.

What license does it use?

DeepSeek-R1-0528-Qwen3-8B is released under the MIT License, which permits commercial use and further model distillation.