AI/TLDR

DeepSeek-V2

The 236B Mixture-of-Experts model that started China's LLM price war

Overview

DeepSeek-V2 is an open-weight Mixture-of-Experts large language model released by Chinese AI lab DeepSeek in May 2024. It has 236 billion total parameters but activates only 21 billion per token, which is what lets a model this large run cheaply. It supports a 128K-token context window and was pretrained on 8.1 trillion tokens of text and code.

Its two headline ideas are DeepSeekMoE and Multi-head Latent Attention (MLA). DeepSeekMoE splits the feed-forward layers into 2 shared experts plus 160 routed experts and uses just 6 of them per token, so most of the network sits idle on any given step. MLA compresses the key-value cache that normally dominates inference memory, and DeepSeek reports it cuts that cache by 93.3% and boosts maximum throughput 5.76x compared with the older dense DeepSeek 67B. Together these techniques are why DeepSeek-V2 could be served so cheaply.

DeepSeek-V2 is best remembered for triggering a price war: its launch API rates were so low (the Financial Times reported roughly 2 RMB per million output tokens) that other Chinese labs quickly cut their own prices. The model is now discontinued — DeepSeek replaced it with DeepSeek-V2.5 in September 2024 and the much larger DeepSeek-V3 in December 2024 — but its open weights remain on Hugging Face and its MLA + MoE recipe carried directly into those successors.

Released2024-05
LicenseDeepSeek Model License (source-available; commercial use permitted). Repository code is MIT-licensed.
WeightsOpen weights
Parameters236B total, 21B activated per token (Mixture-of-Experts)
Context128K tokens
ArchitectureMixture-of-Experts (MoE) decoder-only Transformer using DeepSeekMoE for the feed-forward layers (2 shared experts + 160 routed experts, 6 activated per token) and Multi-head Latent Attention (MLA), which compresses the key-value cache via low-rank joint compression. 236B total parameters with 21B activated per token; pretrained on 8.1 trillion tokens. DeepSeek reports a 93.3% KV-cache reduction and 5.76x higher maximum generation throughput versus the earlier dense DeepSeek 67B.
Knowledge cutoffNot officially published by DeepSeek
Modalitiestext
StatusDiscontinued — superseded by DeepSeek-V2.5 (September 2024) and DeepSeek-V3 (December 2024). Open weights remain available on Hugging Face; the original hosted API endpoint has long since moved to newer models.

Benchmarks

  1. MMLU (Base)78.5%
  2. BBH (Base)78.9%
  3. C-Eval (Base)81.7%
  4. CMMLU (Base)84%
  5. GSM8K (Chat)92.2%
  6. HumanEval (Chat)81.1%
  7. MATH (Chat)53.9%
  8. MMLU (Chat)77.8%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Strengths

  • Extremely cheap to serve for its size — only 21B of 236B parameters activate per token
  • Multi-head Latent Attention cuts KV-cache memory by 93.3%, enabling long contexts at low cost
  • Strong coding and math scores for an open model of its era (HumanEval 81.1, GSM8K 92.2 on the Chat model)
  • 128K-token context window with reliable long-context retrieval
  • Open weights under a commercially-permissive license, with an MIT-licensed code repository
  • Strong bilingual (Chinese + English) performance — C-Eval 81.7, CMMLU 84.0 on the base model

Best for

  • Low-cost, high-throughput text generation and chat at scale
  • Coding assistance and code generation
  • Math and reasoning tasks
  • Long-document question answering and summarization within a 128K window
  • Chinese and English bilingual applications
  • Self-hosting on your own GPUs (e.g. via vLLM) when you need open weights and data control
  • A historical reference / baseline for studying MoE and MLA architectures

How to access

ProviderModel ID
DeepSeek ↗deepseek-chat (historical; endpoint has since moved to newer models)
Hugging Face ↗deepseek-ai/DeepSeek-V2

DeepSeek V3 — every version

The full lineage of the DeepSeek V3 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
DeepSeek-V3.2current2025-12-01Open weights
DeepSeek-V3.2-Speciale2025-12-01Open weights
DeepSeek-V3.2-Exp2025-09-29Open weights
DeepSeek-V3.1-Terminus2025-09-22Open weights
DeepSeek-V3.12025-08-21Open weights
DeepSeek-V3-03242025-03-24Open weights
DeepSeek-V32024-12-26Open weights
DeepSeek-V2.52024-09-05Open weights
DeepSeek-V22024-05Open weights

FAQ

What is DeepSeek-V2?

DeepSeek-V2 is an open-weight Mixture-of-Experts large language model released by DeepSeek in May 2024. It has 236 billion total parameters but activates only 21 billion per token, supports a 128K-token context window, and was trained on 8.1 trillion tokens.

Is DeepSeek-V2 still available?

Its open weights are still downloadable on Hugging Face, but the model is discontinued. DeepSeek replaced it with DeepSeek-V2.5 in September 2024 and DeepSeek-V3 in December 2024, and the hosted API now serves newer models.

What made DeepSeek-V2 important?

Two things. Technically, it introduced Multi-head Latent Attention (MLA) and the DeepSeekMoE design, cutting KV-cache memory by 93.3% and boosting throughput 5.76x versus the dense DeepSeek 67B. Commercially, its very low launch price set off a price war among Chinese AI labs.

How big is DeepSeek-V2 and how much runs at once?

It has 236B total parameters but only 21B activate per token because it is a Mixture-of-Experts model. Each MoE layer uses 2 shared experts plus 160 routed experts and selects just 6 of them per token, which keeps inference cheap despite the large total size.