DeepSeek-Coder-V2

Name: DeepSeek-Coder-V2
Author: DeepSeek

Open Mixture-of-Experts code model (236B/21B and 16B/2.4B) that reached GPT-4-Turbo-class coding across 338 languages.

Overview

DeepSeek-Coder-V2 is an open-weight, code-specialized Mixture-of-Experts model from DeepSeek, released on June 17, 2024. It comes in two sizes: a flagship with 236B total parameters and 21B active, and a smaller DeepSeek-Coder-V2-Lite with 16B total and 2.4B active. Both ship as base and instruction-tuned checkpoints, and were further pre-trained from an intermediate DeepSeek-V2 checkpoint on an additional 6 trillion tokens of code and math data.

Compared with the original DeepSeek Coder, DeepSeek-Coder-V2 expands programming-language coverage from 86 to 338 languages and extends the context window from 16K to 128K tokens. DeepSeek positions it as the first open-source model to break the barrier of closed-source code intelligence: on coding and math benchmarks it reaches performance comparable to GPT-4-Turbo and reportedly edges out Claude 3 Opus and Gemini 1.5 Pro of that era.

On reported benchmarks the 236B Instruct model scores 90.2% on HumanEval, 76.2% on MBPP+, 43.4% on LiveCodeBench, 75.7% on MATH, and 94.9% on GSM8K, while keeping solid general-language ability at 79.2% MMLU. The weights are downloadable from Hugging Face under an MIT code license plus a DeepSeek Model License that permits commercial use; the hosted API endpoint was later merged into DeepSeek-V2.5 in September 2024.

Released	2024-06-17
License	MIT (code) + DeepSeek Model License — commercial use permitted
Weights	Open weights
Parameters	236B total · 21B active (Lite: 16B total · 2.4B active)
Context	128K
Max output	Undisclosed
Architecture	Mixture-of-Experts (DeepSeekMoE), further pre-trained from a DeepSeek-V2 checkpoint on an extra 6 trillion tokens.
Knowledge cutoff	November 2023
Modalities	Text
Status	Generally available

Benchmarks

HumanEval (DeepSeek-Coder-V2-Instruct)90.2%
MBPP+ (EvalPlus)76.2%
LiveCodeBench43.4%
SWE-Bench12.7%
Aider (code editing)73.7%
MATH75.7%
GSM8K94.9%
MMLU79.2%
Arena-Hard65%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	Free (open weights) self-hosted
Output	Free (open weights) self-hosted

Weights are downloadable from Hugging Face for self-hosting; the original hosted deepseek-coder API endpoint was merged into DeepSeek-V2.5 in September 2024.

Pricing source ↗

Strengths

GPT-4-Turbo-class coding performance from a fully open-weight model (90.2% HumanEval)
Broad language coverage — 338 programming languages, up from 86 in DeepSeek Coder v1
128K-token context for whole-repository and long-file reasoning
Strong mathematical reasoning: 75.7% MATH and 94.9% GSM8K
Two sizes — a 236B/21B-active flagship and a lightweight 16B/2.4B-active Lite — plus base and instruct checkpoints
Permissive licensing (MIT code + commercial-use model license) for self-hosting

Best for

Code generation and autocompletion across many languages
Code repair and bug fixing
Mathematical and algorithmic reasoning
Repository-scale code understanding using the 128K context
Self-hosted, on-prem coding assistants where open weights are required
Cost-efficient open-weight alternative to closed coding APIs

How to access

Provider	Model ID
Hugging Face (download weights) ↗	`deepseek-ai/DeepSeek-Coder-V2-Instruct`
DeepSeek Platform (merged into DeepSeek-V2.5) ↗	`deepseek-coder`

DeepSeek Coder — every version

The full lineage of the DeepSeek Coder line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
DeepSeek-Coder-V2current	2024-06-17	—	Open weights
DeepSeek Coder	2023-11-02	—	Open weights

FAQ

What is DeepSeek-Coder-V2?

DeepSeek-Coder-V2 is an open-weight, code-specialized Mixture-of-Experts model released by DeepSeek on June 17, 2024. It comes as a 236B-total / 21B-active flagship and a 16B-total / 2.4B-active Lite version, each with base and instruction-tuned checkpoints, and was further pre-trained from a DeepSeek-V2 checkpoint on an extra 6 trillion tokens.

How well does DeepSeek-Coder-V2 perform on coding benchmarks?

The 236B Instruct model scores 90.2% on HumanEval, 76.2% on MBPP+, 43.4% on LiveCodeBench, and 73.7% on Aider, with 75.7% MATH and 94.9% GSM8K for math. DeepSeek reports performance comparable to GPT-4-Turbo and ahead of Claude 3 Opus and Gemini 1.5 Pro of that period.

How many languages and how much context does it support?

It supports 338 programming languages (up from 86 in DeepSeek Coder v1) and a 128K-token context window (extended from 16K).

Is DeepSeek-Coder-V2 free and open source?

Yes. The weights are downloadable from Hugging Face under an MIT license for code plus a DeepSeek Model License that permits commercial use. The original hosted deepseek-coder API endpoint was later merged into DeepSeek-V2.5 in September 2024.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// DeepSeek Coder — every version

// FAQ