DeepSeek-V2

Name: DeepSeek-V2
Author: DeepSeek

The 236B Mixture-of-Experts model that started China's LLM price war

Overview

DeepSeek-V2 is an open-weight Mixture-of-Experts large language model released by Chinese AI lab DeepSeek in May 2024. It has 236 billion total parameters but activates only 21 billion per token, which is what lets a model this large run cheaply. It supports a 128K-token context window and was pretrained on 8.1 trillion tokens of text and code.

Its two headline ideas are DeepSeekMoE and Multi-head Latent Attention (MLA). DeepSeekMoE splits the feed-forward layers into 2 shared experts plus 160 routed experts and uses just 6 of them per token, so most of the network sits idle on any given step. MLA compresses the key-value cache that normally dominates inference memory, and DeepSeek reports it cuts that cache by 93.3% and boosts maximum throughput 5.76x compared with the older dense DeepSeek 67B. Together these techniques are why DeepSeek-V2 could be served so cheaply.

DeepSeek-V2 is best remembered for triggering a price war: its launch API rates were so low (the Financial Times reported roughly 2 RMB per million output tokens) that other Chinese labs quickly cut their own prices. The model is now discontinued — DeepSeek replaced it with DeepSeek-V2.5 in September 2024 and the much larger DeepSeek-V3 in December 2024 — but its open weights remain on Hugging Face and its MLA + MoE recipe carried directly into those successors.

Released	2024-05
License	DeepSeek Model License (source-available; commercial use permitted). Repository code is MIT-licensed.
Weights	Open weights
Parameters	236B total, 21B activated per token (Mixture-of-Experts)
Context	128K tokens
Architecture	Mixture-of-Experts (MoE) decoder-only Transformer using DeepSeekMoE for the feed-forward layers (2 shared experts + 160 routed experts, 6 activated per token) and Multi-head Latent Attention (MLA), which compresses the key-value cache via low-rank joint compression. 236B total parameters with 21B activated per token; pretrained on 8.1 trillion tokens. DeepSeek reports a 93.3% KV-cache reduction and 5.76x higher maximum generation throughput versus the earlier dense DeepSeek 67B.
Knowledge cutoff	Not officially published by DeepSeek
Modalities	text
Status	Discontinued — superseded by DeepSeek-V2.5 (September 2024) and DeepSeek-V3 (December 2024). Open weights remain available on Hugging Face; the original hosted API endpoint has long since moved to newer models.

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Strengths

Extremely cheap to serve for its size — only 21B of 236B parameters activate per token
Multi-head Latent Attention cuts KV-cache memory by 93.3%, enabling long contexts at low cost
Strong coding and math scores for an open model of its era (HumanEval 81.1, GSM8K 92.2 on the Chat model)
128K-token context window with reliable long-context retrieval
Open weights under a commercially-permissive license, with an MIT-licensed code repository
Strong bilingual (Chinese + English) performance — C-Eval 81.7, CMMLU 84.0 on the base model

Best for

Low-cost, high-throughput text generation and chat at scale
Coding assistance and code generation
Math and reasoning tasks
Long-document question answering and summarization within a 128K window
Chinese and English bilingual applications
Self-hosting on your own GPUs (e.g. via vLLM) when you need open weights and data control
A historical reference / baseline for studying MoE and MLA architectures

How to access

Provider	Model ID
DeepSeek ↗	`deepseek-chat (historical; endpoint has since moved to newer models)`
Hugging Face ↗	`deepseek-ai/DeepSeek-V2`

DeepSeek V3 — every version

The full lineage of the DeepSeek V3 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
DeepSeek-V3.2current	2025-12-01	—	Open weights
DeepSeek-V3.2-Speciale	2025-12-01	—	Open weights
DeepSeek-V3.2-Exp	2025-09-29	—	Open weights
DeepSeek-V3.1-Terminus	2025-09-22	—	Open weights
DeepSeek-V3.1	2025-08-21	—	Open weights
DeepSeek-V3-0324	2025-03-24	—	Open weights
DeepSeek-V3	2024-12-26	—	Open weights
DeepSeek-V2.5	2024-09-05	—	Open weights
DeepSeek-V2	2024-05	—	Open weights

FAQ

What is DeepSeek-V2?

DeepSeek-V2 is an open-weight Mixture-of-Experts large language model released by DeepSeek in May 2024. It has 236 billion total parameters but activates only 21 billion per token, supports a 128K-token context window, and was trained on 8.1 trillion tokens.

Is DeepSeek-V2 still available?

Its open weights are still downloadable on Hugging Face, but the model is discontinued. DeepSeek replaced it with DeepSeek-V2.5 in September 2024 and DeepSeek-V3 in December 2024, and the hosted API now serves newer models.

What made DeepSeek-V2 important?

Two things. Technically, it introduced Multi-head Latent Attention (MLA) and the DeepSeekMoE design, cutting KV-cache memory by 93.3% and boosting throughput 5.76x versus the dense DeepSeek 67B. Commercially, its very low launch price set off a price war among Chinese AI labs.

How big is DeepSeek-V2 and how much runs at once?

It has 236B total parameters but only 21B activate per token because it is a Mixture-of-Experts model. Each MoE layer uses 2 shared experts plus 160 routed experts and selects just 6 of them per token, which keeps inference cheap despite the large total size.

// Overview

// Benchmarks

// Strengths

// Best for

// How to access

// DeepSeek V3 — every version

// FAQ