Overview
DeepSeek-R1-Distill-Qwen-1.5B is the smallest member of DeepSeek's R1-Distill family, released alongside DeepSeek-R1 on 20 January 2025. Rather than being trained from scratch, it is a Qwen2.5-Math-1.5B base model fine-tuned on 800,000 reasoning examples generated by the full 671B-parameter DeepSeek-R1. The goal was to show that the long chain-of-thought reasoning behaviour discovered by R1 could be transplanted into tiny dense models that run on commodity hardware.
At 1.5B parameters it is small enough to run on a laptop CPU or a modest GPU, yet it still produces R1-style step-by-step reasoning, wrapping its thinking in <think> tags before answering. DeepSeek positions the distills as a demonstration that distillation from a strong reasoner beats large-scale RL on a small model: this 1.5B variant outperforms much larger non-reasoning models on competition math while needing a fraction of the memory.
The model is released under the MIT License with fully open weights on Hugging Face, allowing commercial use, modification and further distillation. Because it is derived from Qwen2.5 (originally Apache 2.0), DeepSeek notes the lineage explicitly in the model card. It is text-only and is best treated as a math/reasoning specialist rather than a general chat assistant.
| Released | 2025-01-20 |
|---|---|
| License | MIT |
| Weights | Open weights |
| Parameters | 1.5B (dense; built on Qwen2.5-Math-1.5B) |
| Context | Up to 128K tokens (config max_position_embeddings 131,072; the Qwen2.5-Math-1.5B base is natively 4K, extended by DeepSeek) |
| Max output | 32,768 tokens (recommended max generation length used in DeepSeek's own benchmarks) |
| Architecture | Dense Transformer. Takes the Qwen2.5-Math-1.5B base model and supervised-fine-tunes it on 800k reasoning samples generated by the full DeepSeek-R1 model — a pure distillation with no additional reinforcement-learning stage on the small model. It emits explicit chain-of-thought inside <think>...</think> tags before its final answer. |
| Knowledge cutoff | Inherited from the Qwen2.5-Math-1.5B base (DeepSeek does not publish a separate cutoff for the distills) |
| Modalities | Text |
| Status | Available — open weights. Superseded for reasoning by DeepSeek-R1-0528-Qwen3-8B (May 2025), but still distributed and widely used for on-device inference. |
Benchmarks
- AIME 2024 (pass@1)28.9%
- AIME 2024 (cons@64)52.7%
- MATH-500 (pass@1)83.9%
- GPQA Diamond (pass@1)33.8%
- LiveCodeBench (pass@1)16.9%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Strengths
- Strongest competition-math performance of any sub-2B open model at release — 83.9% on MATH-500 and 28.9% pass@1 on AIME 2024, far above similarly sized base models
- Tiny footprint: runs on consumer laptops (CPU with ~16GB RAM, or a low-end GPU) with no datacenter hardware
- Genuine chain-of-thought reasoning distilled from full DeepSeek-R1, surfaced in <think> tags
- Permissive MIT license with open weights — free to self-host, fine-tune and redistribute
- Hosted by many inference providers and packaged for local runtimes (Ollama, llama.cpp, vLLM)
Best for
- On-device and edge reasoning where a larger model won't fit
- Math problem-solving and step-by-step tutoring
- A cheap, fast draft/scaffold model for speculative decoding or as a router/first-pass reasoner
- Research and teaching on reasoning distillation and small-model chain-of-thought
- Local, private experimentation with R1-style reasoning at near-zero cost
How to access
| Provider | Model ID |
|---|---|
| Hugging Face (weights) ↗ | deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B |
| Ollama ↗ | deepseek-r1:1.5b |
DeepSeek R1 Distill — every version
The full lineage of the DeepSeek R1 Distill line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| DeepSeek-R1-0528-Qwen3-8Bcurrent | 2025-05-29 | 131K | MIT |
| DeepSeek-R1-Distill-Llama-70B | 2025-01-20 | — | Open weights |
| DeepSeek-R1-Distill-Qwen-32B | 2025-01-20 | — | Open weights |
| DeepSeek-R1-Distill-Qwen-14B | 2025-01-20 | — | Open weights |
| DeepSeek-R1-Distill-Llama-8B | 2025-01-20 | — | Open weights |
| DeepSeek-R1-Distill-Qwen-7B | 2025-01-20 | — | Open weights |
| DeepSeek-R1-Distill-Qwen-1.5B | 2025-01-20 | — | Open weights |
FAQ
What is DeepSeek-R1-Distill-Qwen-1.5B?
It is the smallest of DeepSeek's R1-Distill models, released on 20 January 2025. It takes the Qwen2.5-Math-1.5B base model and fine-tunes it on 800,000 reasoning examples produced by the full DeepSeek-R1, giving a 1.5B-parameter model that does R1-style chain-of-thought reasoning.
Is it open source and free to use commercially?
Yes. The weights are openly released on Hugging Face under the MIT License, which permits commercial use, modification and further distillation. The model is derived from Qwen2.5, which was originally Apache 2.0 licensed; DeepSeek notes this lineage in the model card.
How good is it at math and reasoning for its size?
Very strong for a sub-2B model. DeepSeek reports 83.9% on MATH-500 and 28.9% pass@1 (52.7% cons@64) on AIME 2024, plus 33.8% on GPQA Diamond — beating much larger non-reasoning models on competition math, though coding (16.9% on LiveCodeBench) is weaker.
Can it run locally?
Yes. At 1.5B parameters it fits on consumer hardware — it can run on a CPU with around 16GB of RAM or a low-end GPU, and is packaged for local runtimes such as Ollama (deepseek-r1:1.5b), llama.cpp and vLLM. DeepSeek recommends a generation length up to 32,768 tokens for its reasoning traces.