AI/TLDR

DeepSeek-V2.5

DeepSeek's September 2024 model that fused its V2 chat and Coder-V2 models into one open-weight MoE assistant.

Overview

DeepSeek-V2.5 is an open-weight large language model released by DeepSeek on September 5, 2024. It merged two earlier DeepSeek models — the general-purpose DeepSeek-V2-Chat (the 0628 checkpoint) and the code-focused DeepSeek-Coder-V2-Instruct (the 0724 checkpoint) — into a single assistant, so one model could handle everyday conversation and programming without users having to switch endpoints.

Under the hood DeepSeek-V2.5 keeps the DeepSeek-V2 architecture: a Mixture-of-Experts transformer with 236 billion total parameters of which about 21 billion are active per token, plus Multi-head Latent Attention (MLA) that compresses the key-value cache for cheaper inference. It is a text-only, non-reasoning model and supports a 128K-token context. The code is released under the MIT License and the weights under the DeepSeek Model License, which permits commercial use.

DeepSeek shipped a revised checkpoint, DeepSeek-V2.5-1210, on December 10, 2024, which improved math (MATH-500 from 74.8 to 82.8) and coding (LiveCodeBench from 29.2 to 34.38) along with file-upload and web-summarization tweaks. Shortly after, DeepSeek-V3 launched (December 26, 2024) and became the served model, making V2.5 a transitional release between the V2 and V3 generations.

Released2024-09-05
LicenseOpen weights — code under MIT License; model weights under the DeepSeek Model License (commercial use permitted).
WeightsOpen weights
Parameters236B total / 21B activated per token (Mixture-of-Experts)
Context128K tokens (advertised; the published Hugging Face config sets max_model_len to 8,192)
Max outputNot officially published by DeepSeek
ArchitectureMixture-of-Experts (DeepSeekMoE) transformer with Multi-head Latent Attention (MLA) for a compressed KV cache, inherited from DeepSeek-V2. V2.5 is a post-training merge of DeepSeek-V2-Chat (V2-0628) and DeepSeek-Coder-V2-Instruct (Coder-V2-0724) into a single instruction-tuned model.
Knowledge cutoffNot officially published by DeepSeek
ModalitiesText
StatusSuperseded. DeepSeek-V2.5 was the production model in late 2024 (revised as V2.5-1210 on 2024-12-10) and was replaced by DeepSeek-V3 from December 2024 onward. The open weights remain on Hugging Face, but it is no longer DeepSeek's served model.

Benchmarks

  1. HumanEval (Python)89pass@1 %
  2. ArenaHard76.2win rate %
  3. MMLU-Pro65.83%
  4. AlpacaEval 2.050.5LC win rate %
  5. LiveCodeBench (01-09)41.8%
  6. MT-Bench9.02score (0-10)
  7. AlignBench8.04score (0-10)

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Strengths

  • Combined strong general chat and coding in one model, removing the need to choose between a chat and a coder endpoint
  • Backward-compatible API: existing deepseek-chat and deepseek-coder calls pointed to the new merged model
  • MoE with MLA keeps only ~21B of 236B parameters active per token, making inference relatively cheap for its size
  • Improved human-preference alignment and safety over DeepSeek-V2-0628 (overall safety score rose to 82.6% from 74.4%)
  • Open weights with commercial-use permission, downloadable from Hugging Face
  • Practical developer features carried over: function calling, JSON output, and Fill-in-the-Middle (FIM) code completion

Best for

  • General-purpose chat assistants that also need solid coding help
  • Code generation, completion, and Fill-in-the-Middle tasks in developer tools
  • Self-hosted or on-prem deployments where open weights and commercial licensing matter
  • Function-calling and structured-JSON workflows
  • A reproducible historical baseline for studying the DeepSeek V2-to-V3 transition

How to access

ProviderModel ID
DeepSeek ↗deepseek-chat
Hugging Face ↗deepseek-ai/DeepSeek-V2.5
Ollama ↗deepseek-v2.5

DeepSeek V3 — every version

The full lineage of the DeepSeek V3 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
DeepSeek-V3.2current2025-12-01Open weights
DeepSeek-V3.2-Speciale2025-12-01Open weights
DeepSeek-V3.2-Exp2025-09-29Open weights
DeepSeek-V3.1-Terminus2025-09-22Open weights
DeepSeek-V3.12025-08-21Open weights
DeepSeek-V3-03242025-03-24Open weights
DeepSeek-V32024-12-26Open weights
DeepSeek-V2.52024-09-05Open weights
DeepSeek-V22024-05Open weights

FAQ

What is DeepSeek-V2.5?

DeepSeek-V2.5 is an open-weight large language model released on September 5, 2024 that merged DeepSeek-V2-Chat (the 0628 checkpoint) and DeepSeek-Coder-V2-Instruct (the 0724 checkpoint) into a single model good at both general chat and coding. It is a 236B-total / 21B-active Mixture-of-Experts model with 128K context.

Is DeepSeek-V2.5 still the current model?

No. DeepSeek-V2.5 was revised as DeepSeek-V2.5-1210 on December 10, 2024, then superseded by DeepSeek-V3 (released December 26, 2024) and later versions. The open weights remain available on Hugging Face, but it is no longer DeepSeek's served model.

What license does DeepSeek-V2.5 use, and can I use it commercially?

The code is released under the MIT License and the model weights under the DeepSeek Model License, which permits commercial use including deployment, fine-tuning, and building products on top of the model.

How does DeepSeek-V2.5 perform on benchmarks?

DeepSeek's model card reports HumanEval (Python) 89, ArenaHard 76.2, MMLU-Pro 65.83, AlpacaEval 2.0 50.5, LiveCodeBench (01-09) 41.8, plus MT-Bench 9.02 and AlignBench 8.04 on their 0-10 scales — improving on both source models on most tests.