DeepSeek-R1-Distill-Qwen-1.5B

Name: DeepSeek-R1-Distill-Qwen-1.5B
Author: DeepSeek

The smallest R1 reasoning distill — chain-of-thought math on a 1.5B model that runs on a laptop.

Overview

DeepSeek-R1-Distill-Qwen-1.5B is the smallest member of DeepSeek's R1-Distill family, released alongside DeepSeek-R1 on 20 January 2025. Rather than being trained from scratch, it is a Qwen2.5-Math-1.5B base model fine-tuned on 800,000 reasoning examples generated by the full 671B-parameter DeepSeek-R1. The goal was to show that the long chain-of-thought reasoning behaviour discovered by R1 could be transplanted into tiny dense models that run on commodity hardware.

At 1.5B parameters it is small enough to run on a laptop CPU or a modest GPU, yet it still produces R1-style step-by-step reasoning, wrapping its thinking in <think> tags before answering. DeepSeek positions the distills as a demonstration that distillation from a strong reasoner beats large-scale RL on a small model: this 1.5B variant outperforms much larger non-reasoning models on competition math while needing a fraction of the memory.

The model is released under the MIT License with fully open weights on Hugging Face, allowing commercial use, modification and further distillation. Because it is derived from Qwen2.5 (originally Apache 2.0), DeepSeek notes the lineage explicitly in the model card. It is text-only and is best treated as a math/reasoning specialist rather than a general chat assistant.

Released	2025-01-20
License	MIT
Weights	Open weights
Parameters	1.5B (dense; built on Qwen2.5-Math-1.5B)
Context	Up to 128K tokens (config max_position_embeddings 131,072; the Qwen2.5-Math-1.5B base is natively 4K, extended by DeepSeek)
Max output	32,768 tokens (recommended max generation length used in DeepSeek's own benchmarks)
Architecture	Dense Transformer. Takes the Qwen2.5-Math-1.5B base model and supervised-fine-tunes it on 800k reasoning samples generated by the full DeepSeek-R1 model — a pure distillation with no additional reinforcement-learning stage on the small model. It emits explicit chain-of-thought inside <think>...</think> tags before its final answer.
Knowledge cutoff	Inherited from the Qwen2.5-Math-1.5B base (DeepSeek does not publish a separate cutoff for the distills)
Modalities	Text
Status	Available — open weights. Superseded for reasoning by DeepSeek-R1-0528-Qwen3-8B (May 2025), but still distributed and widely used for on-device inference.

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Strengths

Strongest competition-math performance of any sub-2B open model at release — 83.9% on MATH-500 and 28.9% pass@1 on AIME 2024, far above similarly sized base models
Tiny footprint: runs on consumer laptops (CPU with ~16GB RAM, or a low-end GPU) with no datacenter hardware
Genuine chain-of-thought reasoning distilled from full DeepSeek-R1, surfaced in <think> tags
Permissive MIT license with open weights — free to self-host, fine-tune and redistribute
Hosted by many inference providers and packaged for local runtimes (Ollama, llama.cpp, vLLM)

Best for

On-device and edge reasoning where a larger model won't fit
Math problem-solving and step-by-step tutoring
A cheap, fast draft/scaffold model for speculative decoding or as a router/first-pass reasoner
Research and teaching on reasoning distillation and small-model chain-of-thought
Local, private experimentation with R1-style reasoning at near-zero cost

How to access

Provider	Model ID
Hugging Face (weights) ↗	`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`
Ollama ↗	`deepseek-r1:1.5b`

DeepSeek R1 Distill — every version

The full lineage of the DeepSeek R1 Distill line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
DeepSeek-R1-0528-Qwen3-8Bcurrent	2025-05-29	131K	MIT
DeepSeek-R1-Distill-Llama-70B	2025-01-20	—	Open weights
DeepSeek-R1-Distill-Qwen-32B	2025-01-20	—	Open weights
DeepSeek-R1-Distill-Qwen-14B	2025-01-20	—	Open weights
DeepSeek-R1-Distill-Llama-8B	2025-01-20	—	Open weights
DeepSeek-R1-Distill-Qwen-7B	2025-01-20	—	Open weights
DeepSeek-R1-Distill-Qwen-1.5B	2025-01-20	—	Open weights

FAQ

What is DeepSeek-R1-Distill-Qwen-1.5B?

It is the smallest of DeepSeek's R1-Distill models, released on 20 January 2025. It takes the Qwen2.5-Math-1.5B base model and fine-tunes it on 800,000 reasoning examples produced by the full DeepSeek-R1, giving a 1.5B-parameter model that does R1-style chain-of-thought reasoning.

Is it open source and free to use commercially?

Yes. The weights are openly released on Hugging Face under the MIT License, which permits commercial use, modification and further distillation. The model is derived from Qwen2.5, which was originally Apache 2.0 licensed; DeepSeek notes this lineage in the model card.

How good is it at math and reasoning for its size?

Very strong for a sub-2B model. DeepSeek reports 83.9% on MATH-500 and 28.9% pass@1 (52.7% cons@64) on AIME 2024, plus 33.8% on GPQA Diamond — beating much larger non-reasoning models on competition math, though coding (16.9% on LiveCodeBench) is weaker.

Can it run locally?

Yes. At 1.5B parameters it fits on consumer hardware — it can run on a CPU with around 16GB of RAM or a low-end GPU, and is packaged for local runtimes such as Ollama (deepseek-r1:1.5b), llama.cpp and vLLM. DeepSeek recommends a generation length up to 32,768 tokens for its reasoning traces.

// Overview

// Benchmarks

// Strengths

// Best for

// How to access

// DeepSeek R1 Distill — every version

// FAQ