AI/TLDR

DeepSeek-R1-Distill-Llama-70B

R1 reasoning distilled onto a Llama 3.3 70B base — the largest of DeepSeek's January 2025 distill set.

Overview

DeepSeek-R1-Distill-Llama-70B is an open-weight reasoning model that DeepSeek released on January 20, 2025 alongside its flagship DeepSeek-R1. Instead of being a from-scratch model, it takes Meta's Llama-3.3-70B-Instruct as a base and fine-tunes it on roughly 800K reasoning samples generated by the full DeepSeek-R1. The result is a 70.6B dense model that produces R1-style chain-of-thought — thinking through a problem step by step before answering — while running on the same hardware and inference stacks already built for Llama 3.3 70B.

It is the largest of the six original DeepSeek-R1 distill checkpoints (the others target 1.5B / 7B / 14B / 32B Qwen bases and an 8B Llama base). DeepSeek's own evaluation table puts the Llama-70B distill at the top of that set on math and reasoning, scoring 70.0 on AIME 2024, 94.5 on MATH-500, and 65.2 on GPQA Diamond — competitive with much larger reasoning systems while being downloadable and self-hostable.

The model is published under the MIT license, which permits commercial use, modification, and further distillation; the underlying Llama 3.3 weights remain subject to Meta's Llama 3.3 Community License. After launch it was widely served by inference providers — Groq, Together, Fireworks, OpenRouter and others — though hosted availability has since narrowed: Groq, for example, deprecated the model in September 2025 and decommissioned it in February 2026, pointing users to newer alternatives. The weights themselves remain available on Hugging Face.

Released2025-01-20
LicenseMIT (base model derived from Llama-3.3-70B-Instruct, originally under the Llama 3.3 Community License)
WeightsOpen weights
Parameters70.6B (dense)
Context128K tokens
Max output32,768 tokens (recommended max generation length)
ArchitectureDense transformer (Llama 3.3 70B architecture), fine-tuned on DeepSeek-R1 reasoning traces
Knowledge cutoffNot separately disclosed by DeepSeek; inherits the Llama 3.3 base pretraining cutoff of December 2023
ModalitiesText
StatusAvailable as open weights; hosted access wound down at some providers (Groq deprecated it Sept 2025 and decommissioned it Feb 2026).

Benchmarks

  1. AIME 2024 (pass@1)70%
  2. MATH-500 (pass@1)94.5%
  3. GPQA Diamond (pass@1)65.2%
  4. LiveCodeBench (pass@1)57.5%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.80 / 1M tokens
Output$0.80 / 1M tokens

Open weights, so cost varies by host; example list price from OpenRouter. Free to self-host.

Pricing source ↗

Strengths

  • Strong math and step-by-step reasoning for its size: 94.5 on MATH-500 and 70.0 on AIME 2024 (pass@1) in DeepSeek's own evaluation
  • Drops into existing Llama 3.3 70B infrastructure — same architecture, tokenizer family, and serving stacks (vLLM, SGLang, TGI, Ollama, llama.cpp)
  • Open weights under a permissive MIT license, so it can be self-hosted, quantized, and further fine-tuned or distilled without API lock-in
  • Exposes its chain-of-thought between <think> tags, which is useful for debugging reasoning and for building transparent agent loops
  • Quantizes well to 4-bit, putting a 70B reasoning model within reach of a single high-memory GPU or a multi-GPU workstation

Best for

  • Self-hosted reasoning assistant for math, logic, and structured problem-solving where you want to keep data on-prem
  • Code generation and debugging that benefits from explicit step-by-step thinking (57.5 on LiveCodeBench pass@1)
  • A drop-in reasoning upgrade for teams already serving Llama 3.3 70B who want chain-of-thought without changing their stack
  • Generating reasoning traces / synthetic data to further distill smaller student models
  • Research and evaluation of open reasoning models under a permissive license

How to access

ProviderModel ID
OpenRouter ↗deepseek/deepseek-r1-distill-llama-70b
Groq (deprecated/decommissioned) ↗deepseek-r1-distill-llama-70b

DeepSeek R1 Distill — every version

The full lineage of the DeepSeek R1 Distill line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
DeepSeek-R1-0528-Qwen3-8Bcurrent2025-05-29131KMIT
DeepSeek-R1-Distill-Llama-70B2025-01-20Open weights
DeepSeek-R1-Distill-Qwen-32B2025-01-20Open weights
DeepSeek-R1-Distill-Qwen-14B2025-01-20Open weights
DeepSeek-R1-Distill-Llama-8B2025-01-20Open weights
DeepSeek-R1-Distill-Qwen-7B2025-01-20Open weights
DeepSeek-R1-Distill-Qwen-1.5B2025-01-20Open weights

FAQ

Is DeepSeek-R1-Distill-Llama-70B the same as DeepSeek-R1?

No. DeepSeek-R1 is the full 671B-parameter Mixture-of-Experts reasoning model. The Llama-70B distill is a separate, much smaller 70.6B dense model: it takes Meta's Llama-3.3-70B-Instruct and fine-tunes it on reasoning samples generated by DeepSeek-R1. It mimics R1's chain-of-thought style at a fraction of the size, but it is not R1 itself.

What license is it released under, and can I use it commercially?

DeepSeek published the distill weights under the MIT license, which permits commercial use, modification, and redistribution. Because the model is derived from Llama-3.3-70B-Instruct, the underlying base weights are also subject to Meta's Llama 3.3 Community License, so review both before deploying.

What hardware do I need to run it?

It is a 70.6B dense model, so full-precision serving needs roughly 140GB+ of GPU memory (multiple GPUs). With 4-bit quantization it can fit on a single high-memory GPU or a multi-GPU workstation, and it runs in standard Llama 3.3 stacks such as vLLM, SGLang, Ollama, and llama.cpp.

Is it still available through hosted APIs?

The weights remain on Hugging Face and can be self-hosted indefinitely. Hosted API access has narrowed over time, though — Groq, an early host, deprecated the model in September 2025 and decommissioned it in February 2026, directing users to newer models. Check each provider's current catalog before relying on it.