DeepSeek-R1-Distill-Qwen-32B

Name: DeepSeek-R1-Distill-Qwen-32B
Author: DeepSeek

A 32B reasoning model distilled from DeepSeek-R1 that beat OpenAI o1-mini on math and code.

Overview

DeepSeek-R1-Distill-Qwen-32B is one of six dense distilled models DeepSeek open-sourced on January 20, 2025, alongside the full 671B DeepSeek-R1. It takes a Qwen2.5-32B base and fine-tunes it on roughly 800,000 reasoning traces generated by DeepSeek-R1, transferring R1's step-by-step "think-then-answer" behavior into a model small enough to self-host on a single high-memory GPU.

The pitch at launch was that you no longer needed a frontier-scale model to get strong reasoning. On DeepSeek's own evaluations, DeepSeek-R1-Distill-Qwen-32B outperformed OpenAI's o1-mini across math, science, and coding benchmarks, making it one of the strongest open-weight reasoning models in its size class at the time. It produces visible reasoning inside <think> tags before giving a final answer.

Because it is released under the MIT License (the underlying Qwen2.5 base carries Apache 2.0), the weights are freely downloadable from Hugging Face and have been widely re-hosted, quantized to GGUF/AWQ/GPTQ, and served by inference providers such as Groq, DeepInfra, Fireworks, Together, and Cloudflare Workers AI. DeepSeek recommends running it with a temperature around 0.6, top-p 0.95, and no system prompt.

Released	2025-01-20
License	MIT License (model weights). The base Qwen2.5-32B it was distilled from is originally licensed under Apache 2.0.
Weights	Open weights
Parameters	~32.8B (dense)
Context	131,072 tokens (128K)
Max output	32,768 tokens (recommended max generation length; some hosted providers cap lower)
Architecture	Dense transformer based on Qwen2.5-32B, supervised fine-tuned on 800k reasoning samples generated by DeepSeek-R1. It is a distilled chain-of-thought reasoning model, not RL-trained itself.
Knowledge cutoff	Inherits the Qwen2.5 base pretraining cutoff (DeepSeek did not publish a separate cutoff for the distill)
Modalities	text
Status	Available (open-weight, released January 2025; superseded by newer DeepSeek-R1 distill refreshes but never formally retired).

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.69 per 1M tokens per 1M tokens
Output	$0.69 per 1M tokens per 1M tokens

Pricing is set by third-party hosts, not DeepSeek (the weights are free to self-host). Example: Groq lists $0.69/1M for both input and output; lower-cost hosts such as DeepInfra have listed around $0.27/1M.

Pricing source ↗

Strengths

Strong competition-math and reasoning performance for its size — 72.6% on AIME 2024 and 94.3% on MATH-500, beating OpenAI o1-mini
Open MIT-licensed weights you can download, fine-tune, quantize, and self-host with no usage restrictions
Fits on a single 48GB+ GPU (or runs quantized on consumer hardware), making frontier-style reasoning locally affordable
128K-token context window for long problems, large codebases, and multi-step proofs
Broad ecosystem support: GGUF/AWQ/GPTQ quants plus hosting on Groq, DeepInfra, Fireworks, Together, and others

Best for

Local and on-prem reasoning assistants where data cannot leave the building
Competition-grade math and STEM problem solving (AIME, MATH-500-style tasks)
Code generation, debugging, and algorithmic problem solving
Building agents and tool-use pipelines on top of an inspectable, self-hostable reasoning model
Research and fine-tuning baselines for distillation and chain-of-thought work

How to access

Provider	Model ID
Groq ↗	`deepseek-r1-distill-qwen-32b`
Cloudflare Workers AI ↗	`@cf/deepseek-ai/deepseek-r1-distill-qwen-32b`
NVIDIA NIM ↗	`deepseek-ai/deepseek-r1-distill-qwen-32b`
OpenRouter ↗	`deepseek/deepseek-r1-distill-qwen-32b`

DeepSeek R1 Distill — every version

The full lineage of the DeepSeek R1 Distill line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
DeepSeek-R1-0528-Qwen3-8Bcurrent	2025-05-29	131K	MIT
DeepSeek-R1-Distill-Llama-70B	2025-01-20	—	Open weights
DeepSeek-R1-Distill-Qwen-32B	2025-01-20	—	Open weights
DeepSeek-R1-Distill-Qwen-14B	2025-01-20	—	Open weights
DeepSeek-R1-Distill-Llama-8B	2025-01-20	—	Open weights
DeepSeek-R1-Distill-Qwen-7B	2025-01-20	—	Open weights
DeepSeek-R1-Distill-Qwen-1.5B	2025-01-20	—	Open weights

FAQ

Is DeepSeek-R1-Distill-Qwen-32B the same as DeepSeek-R1?

No. DeepSeek-R1 is the full 671B (MoE) reasoning model trained with reinforcement learning. DeepSeek-R1-Distill-Qwen-32B is a much smaller 32B dense model: it takes a Qwen2.5-32B base and fine-tunes it on about 800,000 reasoning examples generated by DeepSeek-R1. It inherits R1's reasoning style but is a distilled student model, not R1 itself.

What license is it under, and can I use it commercially?

The model weights are released under the MIT License, which permits commercial use, modification, and redistribution. The Qwen2.5-32B base it was distilled from is originally licensed under Apache 2.0. Always review both licenses for your specific use case.

How does it compare to OpenAI o1-mini?

On DeepSeek's published benchmarks, the 32B distill outperforms o1-mini on math and coding: 72.6% vs 63.6% on AIME 2024, 94.3% vs 90.0% on MATH-500, and 62.1% vs 60.0% on GPQA Diamond. That made it one of the strongest open-weight reasoning models in its size class at its January 2025 launch.

What hardware do I need to run it?

At full BF16/FP16 precision the ~32.8B parameters need roughly 65GB+ of VRAM (e.g. an 80GB GPU like an A100/H100, or multiple GPUs). Quantized GGUF/AWQ/GPTQ builds (4-bit) shrink this to roughly 20GB, letting it run on a single 24GB consumer GPU or be served cheaply by API providers.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// DeepSeek R1 Distill — every version

// FAQ