DeepSeek-V2.5

Name: DeepSeek-V2.5
Author: DeepSeek

DeepSeek's September 2024 model that fused its V2 chat and Coder-V2 models into one open-weight MoE assistant.

Overview

DeepSeek-V2.5 is an open-weight large language model released by DeepSeek on September 5, 2024. It merged two earlier DeepSeek models — the general-purpose DeepSeek-V2-Chat (the 0628 checkpoint) and the code-focused DeepSeek-Coder-V2-Instruct (the 0724 checkpoint) — into a single assistant, so one model could handle everyday conversation and programming without users having to switch endpoints.

Under the hood DeepSeek-V2.5 keeps the DeepSeek-V2 architecture: a Mixture-of-Experts transformer with 236 billion total parameters of which about 21 billion are active per token, plus Multi-head Latent Attention (MLA) that compresses the key-value cache for cheaper inference. It is a text-only, non-reasoning model and supports a 128K-token context. The code is released under the MIT License and the weights under the DeepSeek Model License, which permits commercial use.

DeepSeek shipped a revised checkpoint, DeepSeek-V2.5-1210, on December 10, 2024, which improved math (MATH-500 from 74.8 to 82.8) and coding (LiveCodeBench from 29.2 to 34.38) along with file-upload and web-summarization tweaks. Shortly after, DeepSeek-V3 launched (December 26, 2024) and became the served model, making V2.5 a transitional release between the V2 and V3 generations.

Released	2024-09-05
License	Open weights — code under MIT License; model weights under the DeepSeek Model License (commercial use permitted).
Weights	Open weights
Parameters	236B total / 21B activated per token (Mixture-of-Experts)
Context	128K tokens (advertised; the published Hugging Face config sets max_model_len to 8,192)
Max output	Not officially published by DeepSeek
Architecture	Mixture-of-Experts (DeepSeekMoE) transformer with Multi-head Latent Attention (MLA) for a compressed KV cache, inherited from DeepSeek-V2. V2.5 is a post-training merge of DeepSeek-V2-Chat (V2-0628) and DeepSeek-Coder-V2-Instruct (Coder-V2-0724) into a single instruction-tuned model.
Knowledge cutoff	Not officially published by DeepSeek
Modalities	Text
Status	Superseded. DeepSeek-V2.5 was the production model in late 2024 (revised as V2.5-1210 on 2024-12-10) and was replaced by DeepSeek-V3 from December 2024 onward. The open weights remain on Hugging Face, but it is no longer DeepSeek's served model.

Benchmarks

HumanEval (Python)89pass@1 %
ArenaHard76.2win rate %
MMLU-Pro65.83%
AlpacaEval 2.050.5LC win rate %
LiveCodeBench (01-09)41.8%
MT-Bench9.02score (0-10)
AlignBench8.04score (0-10)

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Strengths

Combined strong general chat and coding in one model, removing the need to choose between a chat and a coder endpoint
Backward-compatible API: existing deepseek-chat and deepseek-coder calls pointed to the new merged model
MoE with MLA keeps only ~21B of 236B parameters active per token, making inference relatively cheap for its size
Improved human-preference alignment and safety over DeepSeek-V2-0628 (overall safety score rose to 82.6% from 74.4%)
Open weights with commercial-use permission, downloadable from Hugging Face
Practical developer features carried over: function calling, JSON output, and Fill-in-the-Middle (FIM) code completion

Best for

General-purpose chat assistants that also need solid coding help
Code generation, completion, and Fill-in-the-Middle tasks in developer tools
Self-hosted or on-prem deployments where open weights and commercial licensing matter
Function-calling and structured-JSON workflows
A reproducible historical baseline for studying the DeepSeek V2-to-V3 transition

How to access

Provider	Model ID
DeepSeek ↗	`deepseek-chat`
Hugging Face ↗	`deepseek-ai/DeepSeek-V2.5`
Ollama ↗	`deepseek-v2.5`

DeepSeek V3 — every version

The full lineage of the DeepSeek V3 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
DeepSeek-V3.2current	2025-12-01	—	Open weights
DeepSeek-V3.2-Speciale	2025-12-01	—	Open weights
DeepSeek-V3.2-Exp	2025-09-29	—	Open weights
DeepSeek-V3.1-Terminus	2025-09-22	—	Open weights
DeepSeek-V3.1	2025-08-21	—	Open weights
DeepSeek-V3-0324	2025-03-24	—	Open weights
DeepSeek-V3	2024-12-26	—	Open weights
DeepSeek-V2.5	2024-09-05	—	Open weights
DeepSeek-V2	2024-05	—	Open weights

FAQ

What is DeepSeek-V2.5?

DeepSeek-V2.5 is an open-weight large language model released on September 5, 2024 that merged DeepSeek-V2-Chat (the 0628 checkpoint) and DeepSeek-Coder-V2-Instruct (the 0724 checkpoint) into a single model good at both general chat and coding. It is a 236B-total / 21B-active Mixture-of-Experts model with 128K context.

Is DeepSeek-V2.5 still the current model?

No. DeepSeek-V2.5 was revised as DeepSeek-V2.5-1210 on December 10, 2024, then superseded by DeepSeek-V3 (released December 26, 2024) and later versions. The open weights remain available on Hugging Face, but it is no longer DeepSeek's served model.

What license does DeepSeek-V2.5 use, and can I use it commercially?

The code is released under the MIT License and the model weights under the DeepSeek Model License, which permits commercial use including deployment, fine-tuning, and building products on top of the model.

How does DeepSeek-V2.5 perform on benchmarks?

DeepSeek's model card reports HumanEval (Python) 89, ArenaHard 76.2, MMLU-Pro 65.83, AlpacaEval 2.0 50.5, LiveCodeBench (01-09) 41.8, plus MT-Bench 9.02 and AlignBench 8.04 on their 0-10 scales — improving on both source models on most tests.

// Overview

// Benchmarks

// Strengths

// Best for

// How to access

// DeepSeek V3 — every version

// FAQ