Overview
DeepSeek-V2.5 is an open-weight large language model released by DeepSeek on September 5, 2024. It merged two earlier DeepSeek models — the general-purpose DeepSeek-V2-Chat (the 0628 checkpoint) and the code-focused DeepSeek-Coder-V2-Instruct (the 0724 checkpoint) — into a single assistant, so one model could handle everyday conversation and programming without users having to switch endpoints.
Under the hood DeepSeek-V2.5 keeps the DeepSeek-V2 architecture: a Mixture-of-Experts transformer with 236 billion total parameters of which about 21 billion are active per token, plus Multi-head Latent Attention (MLA) that compresses the key-value cache for cheaper inference. It is a text-only, non-reasoning model and supports a 128K-token context. The code is released under the MIT License and the weights under the DeepSeek Model License, which permits commercial use.
DeepSeek shipped a revised checkpoint, DeepSeek-V2.5-1210, on December 10, 2024, which improved math (MATH-500 from 74.8 to 82.8) and coding (LiveCodeBench from 29.2 to 34.38) along with file-upload and web-summarization tweaks. Shortly after, DeepSeek-V3 launched (December 26, 2024) and became the served model, making V2.5 a transitional release between the V2 and V3 generations.
| Released | 2024-09-05 |
|---|---|
| License | Open weights — code under MIT License; model weights under the DeepSeek Model License (commercial use permitted). |
| Weights | Open weights |
| Parameters | 236B total / 21B activated per token (Mixture-of-Experts) |
| Context | 128K tokens (advertised; the published Hugging Face config sets max_model_len to 8,192) |
| Max output | Not officially published by DeepSeek |
| Architecture | Mixture-of-Experts (DeepSeekMoE) transformer with Multi-head Latent Attention (MLA) for a compressed KV cache, inherited from DeepSeek-V2. V2.5 is a post-training merge of DeepSeek-V2-Chat (V2-0628) and DeepSeek-Coder-V2-Instruct (Coder-V2-0724) into a single instruction-tuned model. |
| Knowledge cutoff | Not officially published by DeepSeek |
| Modalities | Text |
| Status | Superseded. DeepSeek-V2.5 was the production model in late 2024 (revised as V2.5-1210 on 2024-12-10) and was replaced by DeepSeek-V3 from December 2024 onward. The open weights remain on Hugging Face, but it is no longer DeepSeek's served model. |
Benchmarks
- HumanEval (Python)89pass@1 %
- ArenaHard76.2win rate %
- MMLU-Pro65.83%
- AlpacaEval 2.050.5LC win rate %
- LiveCodeBench (01-09)41.8%
- MT-Bench9.02score (0-10)
- AlignBench8.04score (0-10)
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Strengths
- Combined strong general chat and coding in one model, removing the need to choose between a chat and a coder endpoint
- Backward-compatible API: existing deepseek-chat and deepseek-coder calls pointed to the new merged model
- MoE with MLA keeps only ~21B of 236B parameters active per token, making inference relatively cheap for its size
- Improved human-preference alignment and safety over DeepSeek-V2-0628 (overall safety score rose to 82.6% from 74.4%)
- Open weights with commercial-use permission, downloadable from Hugging Face
- Practical developer features carried over: function calling, JSON output, and Fill-in-the-Middle (FIM) code completion
Best for
- General-purpose chat assistants that also need solid coding help
- Code generation, completion, and Fill-in-the-Middle tasks in developer tools
- Self-hosted or on-prem deployments where open weights and commercial licensing matter
- Function-calling and structured-JSON workflows
- A reproducible historical baseline for studying the DeepSeek V2-to-V3 transition
How to access
| Provider | Model ID |
|---|---|
| DeepSeek ↗ | deepseek-chat |
| Hugging Face ↗ | deepseek-ai/DeepSeek-V2.5 |
| Ollama ↗ | deepseek-v2.5 |
DeepSeek V3 — every version
The full lineage of the DeepSeek V3 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| DeepSeek-V3.2current | 2025-12-01 | — | Open weights |
| DeepSeek-V3.2-Speciale | 2025-12-01 | — | Open weights |
| DeepSeek-V3.2-Exp | 2025-09-29 | — | Open weights |
| DeepSeek-V3.1-Terminus | 2025-09-22 | — | Open weights |
| DeepSeek-V3.1 | 2025-08-21 | — | Open weights |
| DeepSeek-V3-0324 | 2025-03-24 | — | Open weights |
| DeepSeek-V3 | 2024-12-26 | — | Open weights |
| DeepSeek-V2.5 | 2024-09-05 | — | Open weights |
| DeepSeek-V2 | 2024-05 | — | Open weights |
FAQ
What is DeepSeek-V2.5?
DeepSeek-V2.5 is an open-weight large language model released on September 5, 2024 that merged DeepSeek-V2-Chat (the 0628 checkpoint) and DeepSeek-Coder-V2-Instruct (the 0724 checkpoint) into a single model good at both general chat and coding. It is a 236B-total / 21B-active Mixture-of-Experts model with 128K context.
Is DeepSeek-V2.5 still the current model?
No. DeepSeek-V2.5 was revised as DeepSeek-V2.5-1210 on December 10, 2024, then superseded by DeepSeek-V3 (released December 26, 2024) and later versions. The open weights remain available on Hugging Face, but it is no longer DeepSeek's served model.
What license does DeepSeek-V2.5 use, and can I use it commercially?
The code is released under the MIT License and the model weights under the DeepSeek Model License, which permits commercial use including deployment, fine-tuning, and building products on top of the model.
How does DeepSeek-V2.5 perform on benchmarks?
DeepSeek's model card reports HumanEval (Python) 89, ArenaHard 76.2, MMLU-Pro 65.83, AlpacaEval 2.0 50.5, LiveCodeBench (01-09) 41.8, plus MT-Bench 9.02 and AlignBench 8.04 on their 0-10 scales — improving on both source models on most tests.