Overview
DeepSeek-R1-Distill-Qwen-32B is one of six dense distilled models DeepSeek open-sourced on January 20, 2025, alongside the full 671B DeepSeek-R1. It takes a Qwen2.5-32B base and fine-tunes it on roughly 800,000 reasoning traces generated by DeepSeek-R1, transferring R1's step-by-step "think-then-answer" behavior into a model small enough to self-host on a single high-memory GPU.
The pitch at launch was that you no longer needed a frontier-scale model to get strong reasoning. On DeepSeek's own evaluations, DeepSeek-R1-Distill-Qwen-32B outperformed OpenAI's o1-mini across math, science, and coding benchmarks, making it one of the strongest open-weight reasoning models in its size class at the time. It produces visible reasoning inside <think> tags before giving a final answer.
Because it is released under the MIT License (the underlying Qwen2.5 base carries Apache 2.0), the weights are freely downloadable from Hugging Face and have been widely re-hosted, quantized to GGUF/AWQ/GPTQ, and served by inference providers such as Groq, DeepInfra, Fireworks, Together, and Cloudflare Workers AI. DeepSeek recommends running it with a temperature around 0.6, top-p 0.95, and no system prompt.
| Released | 2025-01-20 |
|---|---|
| License | MIT License (model weights). The base Qwen2.5-32B it was distilled from is originally licensed under Apache 2.0. |
| Weights | Open weights |
| Parameters | ~32.8B (dense) |
| Context | 131,072 tokens (128K) |
| Max output | 32,768 tokens (recommended max generation length; some hosted providers cap lower) |
| Architecture | Dense transformer based on Qwen2.5-32B, supervised fine-tuned on 800k reasoning samples generated by DeepSeek-R1. It is a distilled chain-of-thought reasoning model, not RL-trained itself. |
| Knowledge cutoff | Inherits the Qwen2.5 base pretraining cutoff (DeepSeek did not publish a separate cutoff for the distill) |
| Modalities | text |
| Status | Available (open-weight, released January 2025; superseded by newer DeepSeek-R1 distill refreshes but never formally retired). |
Benchmarks
- AIME 2024 (pass@1)72.6%
- MATH-500 (pass@1)94.3%
- GPQA Diamond (pass@1)62.1%
- LiveCodeBench (pass@1)57.2%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.69 per 1M tokens per 1M tokens |
|---|---|
| Output | $0.69 per 1M tokens per 1M tokens |
Pricing is set by third-party hosts, not DeepSeek (the weights are free to self-host). Example: Groq lists $0.69/1M for both input and output; lower-cost hosts such as DeepInfra have listed around $0.27/1M.
Strengths
- Strong competition-math and reasoning performance for its size — 72.6% on AIME 2024 and 94.3% on MATH-500, beating OpenAI o1-mini
- Open MIT-licensed weights you can download, fine-tune, quantize, and self-host with no usage restrictions
- Fits on a single 48GB+ GPU (or runs quantized on consumer hardware), making frontier-style reasoning locally affordable
- 128K-token context window for long problems, large codebases, and multi-step proofs
- Broad ecosystem support: GGUF/AWQ/GPTQ quants plus hosting on Groq, DeepInfra, Fireworks, Together, and others
Best for
- Local and on-prem reasoning assistants where data cannot leave the building
- Competition-grade math and STEM problem solving (AIME, MATH-500-style tasks)
- Code generation, debugging, and algorithmic problem solving
- Building agents and tool-use pipelines on top of an inspectable, self-hostable reasoning model
- Research and fine-tuning baselines for distillation and chain-of-thought work
How to access
| Provider | Model ID |
|---|---|
| Groq ↗ | deepseek-r1-distill-qwen-32b |
| Cloudflare Workers AI ↗ | @cf/deepseek-ai/deepseek-r1-distill-qwen-32b |
| NVIDIA NIM ↗ | deepseek-ai/deepseek-r1-distill-qwen-32b |
| OpenRouter ↗ | deepseek/deepseek-r1-distill-qwen-32b |
DeepSeek R1 Distill — every version
The full lineage of the DeepSeek R1 Distill line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| DeepSeek-R1-0528-Qwen3-8Bcurrent | 2025-05-29 | 131K | MIT |
| DeepSeek-R1-Distill-Llama-70B | 2025-01-20 | — | Open weights |
| DeepSeek-R1-Distill-Qwen-32B | 2025-01-20 | — | Open weights |
| DeepSeek-R1-Distill-Qwen-14B | 2025-01-20 | — | Open weights |
| DeepSeek-R1-Distill-Llama-8B | 2025-01-20 | — | Open weights |
| DeepSeek-R1-Distill-Qwen-7B | 2025-01-20 | — | Open weights |
| DeepSeek-R1-Distill-Qwen-1.5B | 2025-01-20 | — | Open weights |
FAQ
Is DeepSeek-R1-Distill-Qwen-32B the same as DeepSeek-R1?
No. DeepSeek-R1 is the full 671B (MoE) reasoning model trained with reinforcement learning. DeepSeek-R1-Distill-Qwen-32B is a much smaller 32B dense model: it takes a Qwen2.5-32B base and fine-tunes it on about 800,000 reasoning examples generated by DeepSeek-R1. It inherits R1's reasoning style but is a distilled student model, not R1 itself.
What license is it under, and can I use it commercially?
The model weights are released under the MIT License, which permits commercial use, modification, and redistribution. The Qwen2.5-32B base it was distilled from is originally licensed under Apache 2.0. Always review both licenses for your specific use case.
How does it compare to OpenAI o1-mini?
On DeepSeek's published benchmarks, the 32B distill outperforms o1-mini on math and coding: 72.6% vs 63.6% on AIME 2024, 94.3% vs 90.0% on MATH-500, and 62.1% vs 60.0% on GPQA Diamond. That made it one of the strongest open-weight reasoning models in its size class at its January 2025 launch.
What hardware do I need to run it?
At full BF16/FP16 precision the ~32.8B parameters need roughly 65GB+ of VRAM (e.g. an 80GB GPU like an A100/H100, or multiple GPUs). Quantized GGUF/AWQ/GPTQ builds (4-bit) shrink this to roughly 20GB, letting it run on a single 24GB consumer GPU or be served cheaply by API providers.