AI/TLDR

DeepSeek-R1-Distill-Qwen-14B

A 14B open-weight reasoning model that fine-tunes Qwen2.5-14B on 800K chain-of-thought traces from DeepSeek-R1, putting R1-style step-by-step reasoning on a single consumer GPU.

Overview

DeepSeek-R1-Distill-Qwen-14B is a 14-billion-parameter open-weight reasoning model that DeepSeek released on January 20, 2025, alongside its flagship DeepSeek-R1. Rather than being trained with reinforcement learning from scratch, it is a distillation: DeepSeek took the open-source Qwen2.5-14B model as the base and fine-tuned it on roughly 800,000 reasoning samples generated by the full DeepSeek-R1. The result is a dense model that produces the same explicit, visible chain-of-thought (wrapped in <think> tags) as R1, but small enough to run on a single high-memory consumer or workstation GPU.

It is one of six checkpoints in the original DeepSeek-R1-Distill family, which spans 1.5B, 7B, 8B, 14B, 32B (all Qwen or Llama-based) and a 70B Llama variant. The 14B sits in the middle of the lineup — markedly stronger than the 7B and 8B distills on math and code, while needing far less memory than the 32B and 70B versions. DeepSeek reported that the distilled dense models reach state-of-the-art results for their size and that this 14B checkpoint outperforms OpenAI's o1-mini on several reasoning benchmarks, an unusually strong showing for a model in this parameter range.

Because it inherits Qwen2.5-14B's architecture, the model is a standard dense transformer (not a Mixture-of-Experts) with a 128K-token context window, and DeepSeek caps generation at 32,768 tokens. DeepSeek recommends running it with a temperature around 0.6 (within a 0.5-0.7 range), top-p 0.95, and no system prompt — putting all instructions in the user turn — to avoid repetition loops. The weights are published on Hugging Face under the MIT License, and the model is widely available through inference providers such as Together AI and OpenRouter.

Released2025-01-20
LicenseMIT License (DeepSeek code repository and model weights). The model is derived from Qwen2.5-14B, which is originally released under the Apache 2.0 License; commercial use and modification are permitted.
WeightsOpen weights
Parameters14B (dense; based on Qwen2.5-14B)
Context128K (some serving platforms list 131K / 131,072 tokens)
Max output32,768 tokens (DeepSeek's recommended maximum generation length for R1 models)
ArchitectureDense decoder-only transformer (not Mixture-of-Experts). The base model is Qwen2.5-14B, fine-tuned by DeepSeek on ~800,000 reasoning samples generated by the full DeepSeek-R1. It produces an explicit chain-of-thought wrapped in <think>...</think> tags before the final answer. DeepSeek recommends temperature 0.5-0.7 (0.6 recommended), top-p 0.95, and no system prompt (instructions in the user turn) to avoid repetition.
Knowledge cutoffNot separately disclosed by DeepSeek; the base Qwen2.5-14B was pretrained on data up to 2024.
ModalitiesText
Statusavailable

Benchmarks

  1. AIME 2024 (pass@1)69.7%
  2. MATH-500 (pass@1)93.9%
  3. GPQA Diamond (pass@1)59.1%
  4. LiveCodeBench (pass@1)53.1%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.18 / 1M tokens (Together AI) per 1M tokens
Output$0.18 / 1M tokens (Together AI) per 1M tokens

DeepSeek does not run a first-party hosted endpoint for the distill models; pricing is set by third-party inference providers. Together AI charges a flat $0.18 per million input and output tokens. The weights are MIT-licensed and free to self-host. Reasoning models emit verbose chain-of-thought, so output-token usage per request tends to be higher than non-reasoning models.

Pricing source ↗

Strengths

  • Strong math and reasoning for its size — 93.9% on MATH-500 and 69.7% pass@1 on AIME 2024, with DeepSeek reporting it beats OpenAI o1-mini on several benchmarks
  • Emits an explicit, readable chain-of-thought (in <think> tags) you can inspect and verify, distilled directly from full DeepSeek-R1
  • Open weights under the permissive MIT License, allowing commercial use, modification, and self-hosting
  • Compact enough (14B, dense) to run on a single high-memory consumer or workstation GPU, especially when quantized to 4-bit (GGUF / AWQ builds are widely available)
  • Large 128K-token context window inherited from the Qwen2.5-14B base
  • Cheap to serve through third-party APIs (e.g. ~$0.18 per million tokens on Together AI), with no per-token cost when self-hosted

Best for

  • Self-hosted, privacy-sensitive reasoning workloads where a small open-weight model with visible chain-of-thought is preferred over a closed API
  • Multi-step math, logic, and competition-style problem solving (AIME / MATH-class problems)
  • Coding and algorithmic-reasoning tasks within assistant or agent pipelines
  • Generating reasoning traces and synthetic data on commodity hardware without calling a frontier API
  • Local experimentation and research on distilled reasoning models, including further fine-tuning under the MIT License
  • Cost-sensitive deployments that need R1-style reasoning at a fraction of the size and price of the full DeepSeek-R1

How to access

ProviderModel ID
Together AI ↗deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
OpenRouter ↗deepseek/deepseek-r1-distill-qwen-14b
Hugging Face (self-host) ↗deepseek-ai/DeepSeek-R1-Distill-Qwen-14B

DeepSeek R1 Distill — every version

The full lineage of the DeepSeek R1 Distill line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
DeepSeek-R1-0528-Qwen3-8Bcurrent2025-05-29131KMIT
DeepSeek-R1-Distill-Llama-70B2025-01-20Open weights
DeepSeek-R1-Distill-Qwen-32B2025-01-20Open weights
DeepSeek-R1-Distill-Qwen-14B2025-01-20Open weights
DeepSeek-R1-Distill-Llama-8B2025-01-20Open weights
DeepSeek-R1-Distill-Qwen-7B2025-01-20Open weights
DeepSeek-R1-Distill-Qwen-1.5B2025-01-20Open weights

FAQ

Is DeepSeek-R1-Distill-Qwen-14B the same as DeepSeek-R1?

No. DeepSeek-R1 is a very large Mixture-of-Experts reasoning model. DeepSeek-R1-Distill-Qwen-14B is a much smaller 14B dense model: DeepSeek took the open-source Qwen2.5-14B and fine-tuned it on about 800,000 reasoning examples generated by the full R1. It mimics R1's chain-of-thought reasoning style at a fraction of the size, but it is not as capable as the full R1 model.

What license is DeepSeek-R1-Distill-Qwen-14B released under?

The DeepSeek code repository and the model weights are released under the MIT License, which permits commercial use, modification, and redistribution. Because the model is derived from Qwen2.5-14B (originally Apache 2.0), users should also respect that base license. The weights are downloadable on Hugging Face.

What hardware do I need to run it?

As a 14B dense model it is much lighter than the full R1. In full 16-bit precision it needs roughly 28-30GB of GPU memory, but quantized 4-bit GGUF or AWQ builds (widely available from the community) bring that down to around 10GB, letting it run on a single high-memory consumer or workstation GPU.

How good is it compared to other models?

DeepSeek reported that the distilled dense models set new state-of-the-art results for their size, and that the 14B checkpoint outperforms OpenAI's o1-mini on several reasoning benchmarks. It scores 69.7% on AIME 2024 (pass@1), 93.9% on MATH-500, 59.1% on GPQA Diamond, and 53.1% on LiveCodeBench — strong math and reasoning numbers for a 14B model, though it trails larger frontier reasoning models.