AI/TLDR

DeepSeek-R1-Distill-Qwen-32B

A 32B reasoning model distilled from DeepSeek-R1 that beat OpenAI o1-mini on math and code.

Overview

DeepSeek-R1-Distill-Qwen-32B is one of six dense distilled models DeepSeek open-sourced on January 20, 2025, alongside the full 671B DeepSeek-R1. It takes a Qwen2.5-32B base and fine-tunes it on roughly 800,000 reasoning traces generated by DeepSeek-R1, transferring R1's step-by-step "think-then-answer" behavior into a model small enough to self-host on a single high-memory GPU.

The pitch at launch was that you no longer needed a frontier-scale model to get strong reasoning. On DeepSeek's own evaluations, DeepSeek-R1-Distill-Qwen-32B outperformed OpenAI's o1-mini across math, science, and coding benchmarks, making it one of the strongest open-weight reasoning models in its size class at the time. It produces visible reasoning inside <think> tags before giving a final answer.

Because it is released under the MIT License (the underlying Qwen2.5 base carries Apache 2.0), the weights are freely downloadable from Hugging Face and have been widely re-hosted, quantized to GGUF/AWQ/GPTQ, and served by inference providers such as Groq, DeepInfra, Fireworks, Together, and Cloudflare Workers AI. DeepSeek recommends running it with a temperature around 0.6, top-p 0.95, and no system prompt.

Released2025-01-20
LicenseMIT License (model weights). The base Qwen2.5-32B it was distilled from is originally licensed under Apache 2.0.
WeightsOpen weights
Parameters~32.8B (dense)
Context131,072 tokens (128K)
Max output32,768 tokens (recommended max generation length; some hosted providers cap lower)
ArchitectureDense transformer based on Qwen2.5-32B, supervised fine-tuned on 800k reasoning samples generated by DeepSeek-R1. It is a distilled chain-of-thought reasoning model, not RL-trained itself.
Knowledge cutoffInherits the Qwen2.5 base pretraining cutoff (DeepSeek did not publish a separate cutoff for the distill)
Modalitiestext
StatusAvailable (open-weight, released January 2025; superseded by newer DeepSeek-R1 distill refreshes but never formally retired).

Benchmarks

  1. AIME 2024 (pass@1)72.6%
  2. MATH-500 (pass@1)94.3%
  3. GPQA Diamond (pass@1)62.1%
  4. LiveCodeBench (pass@1)57.2%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.69 per 1M tokens per 1M tokens
Output$0.69 per 1M tokens per 1M tokens

Pricing is set by third-party hosts, not DeepSeek (the weights are free to self-host). Example: Groq lists $0.69/1M for both input and output; lower-cost hosts such as DeepInfra have listed around $0.27/1M.

Pricing source ↗

Strengths

  • Strong competition-math and reasoning performance for its size — 72.6% on AIME 2024 and 94.3% on MATH-500, beating OpenAI o1-mini
  • Open MIT-licensed weights you can download, fine-tune, quantize, and self-host with no usage restrictions
  • Fits on a single 48GB+ GPU (or runs quantized on consumer hardware), making frontier-style reasoning locally affordable
  • 128K-token context window for long problems, large codebases, and multi-step proofs
  • Broad ecosystem support: GGUF/AWQ/GPTQ quants plus hosting on Groq, DeepInfra, Fireworks, Together, and others

Best for

  • Local and on-prem reasoning assistants where data cannot leave the building
  • Competition-grade math and STEM problem solving (AIME, MATH-500-style tasks)
  • Code generation, debugging, and algorithmic problem solving
  • Building agents and tool-use pipelines on top of an inspectable, self-hostable reasoning model
  • Research and fine-tuning baselines for distillation and chain-of-thought work

How to access

ProviderModel ID
Groq ↗deepseek-r1-distill-qwen-32b
Cloudflare Workers AI ↗@cf/deepseek-ai/deepseek-r1-distill-qwen-32b
NVIDIA NIM ↗deepseek-ai/deepseek-r1-distill-qwen-32b
OpenRouter ↗deepseek/deepseek-r1-distill-qwen-32b

DeepSeek R1 Distill — every version

The full lineage of the DeepSeek R1 Distill line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
DeepSeek-R1-0528-Qwen3-8Bcurrent2025-05-29131KMIT
DeepSeek-R1-Distill-Llama-70B2025-01-20Open weights
DeepSeek-R1-Distill-Qwen-32B2025-01-20Open weights
DeepSeek-R1-Distill-Qwen-14B2025-01-20Open weights
DeepSeek-R1-Distill-Llama-8B2025-01-20Open weights
DeepSeek-R1-Distill-Qwen-7B2025-01-20Open weights
DeepSeek-R1-Distill-Qwen-1.5B2025-01-20Open weights

FAQ

Is DeepSeek-R1-Distill-Qwen-32B the same as DeepSeek-R1?

No. DeepSeek-R1 is the full 671B (MoE) reasoning model trained with reinforcement learning. DeepSeek-R1-Distill-Qwen-32B is a much smaller 32B dense model: it takes a Qwen2.5-32B base and fine-tunes it on about 800,000 reasoning examples generated by DeepSeek-R1. It inherits R1's reasoning style but is a distilled student model, not R1 itself.

What license is it under, and can I use it commercially?

The model weights are released under the MIT License, which permits commercial use, modification, and redistribution. The Qwen2.5-32B base it was distilled from is originally licensed under Apache 2.0. Always review both licenses for your specific use case.

How does it compare to OpenAI o1-mini?

On DeepSeek's published benchmarks, the 32B distill outperforms o1-mini on math and coding: 72.6% vs 63.6% on AIME 2024, 94.3% vs 90.0% on MATH-500, and 62.1% vs 60.0% on GPQA Diamond. That made it one of the strongest open-weight reasoning models in its size class at its January 2025 launch.

What hardware do I need to run it?

At full BF16/FP16 precision the ~32.8B parameters need roughly 65GB+ of VRAM (e.g. an 80GB GPU like an A100/H100, or multiple GPUs). Quantized GGUF/AWQ/GPTQ builds (4-bit) shrink this to roughly 20GB, letting it run on a single 24GB consumer GPU or be served cheaply by API providers.