DeepSeek-Math-V2

Name: DeepSeek-Math-V2
Author: DeepSeek

685B open-weight self-verifying math prover that scores IMO/CMO gold and 118/120 on Putnam 2024.

Overview

DeepSeek-Math-V2 is a specialized mathematical-reasoning model from DeepSeek, released on 27 November 2025 as the current flagship of the DeepSeek Math line. It is a 685-billion-parameter Mixture-of-Experts model built on the DeepSeek-V3.2-Exp-Base checkpoint, and it is published as open weights on Hugging Face under the Apache 2.0 license. Unlike a general chat model, DeepSeek-Math-V2 is tuned for writing rigorous natural-language proofs rather than just producing a final numeric answer.

The model's defining idea is self-verification. DeepSeek trains an LLM-based verifier with reinforcement learning (GRPO) that scores a candidate proof on rigor and completeness, plus a meta-verifier that checks the verifier's own critiques so it does not hallucinate flaws. The proof generator is then rewarded for producing solutions whose self-assessed quality matches the verifier's judgment, and it iteratively repairs its own proofs within a 128K-token context before finalizing an answer. This 'verify-first' pipeline is what lets DeepSeek-Math-V2 catch and fix its own mistakes the way a human mathematician re-checks a proof.

With scaled test-time compute and large verification budgets, DeepSeek-Math-V2 reaches gold-medal-level performance on IMO 2025 and CMO 2024 and scores 118 of 120 on Putnam 2024 — above the best human score that year. On the IMO-ProofBench evaluation it outperforms Google DeepMind's DeepThink IMO Gold on the Basic subset and stays competitive on the Advanced subset, making it one of the strongest openly released math-reasoning models to date. Because it is open-weight, researchers can download and run it via the DeepSeek-V3.2-Exp inference stack.

Released	2025-11-27
License	Apache-2.0
Weights	Open weights
Parameters	685B total (MoE)
Context	128K
Architecture	Mixture-of-Experts (verifier-generator self-verification)
Modalities	Text
Status	Research preview

Benchmarks

Bar chart comparing DeepSeekMath-V2 (Heavy) with Claude Sonnet 4, DeepSeek R1, Qwen3-235B, Grok 4, Gemini 2.5 Pro, GPT-5, Gemini 2.5 Pro (Huang & Yang 2025), and Gemini Deep Think (IMO lite / IMO Gold) on the IMO-ProofBench benchmark, showing human-evaluation scores on ProofBench-Basic and ProofBench-Advanced. — DeepSeekMath-V2 vs. leading models on IMO-ProofBench (Basic and Advanced), scored by human evaluation. — DeepSeek

IMO-ProofBench human-evaluation scores (%): DeepSeekMath-V2 (Heavy) vs. other named models, on ProofBench-Basic and ProofBench-Advanced. Transcribed from the IMO-ProofBench.png chart on the DeepSeek-Math-V2 model card.

Benchmark	Claude Sonnet 4	DeepSeek R1	Qwen3-235B	Grok 4	Gemini 2.5 Pro	GPT-5	Gemini 2.5 Pro with (Huang & Yang, 2025)	Gemini Deep Think (IMO lite)	Gemini Deep Think (IMO Gold)	DeepSeekMath-V2 (Heavy)
IMO-ProofBench (ProofBench-Basic)	27.1%	29%	33.3%	46.7%	55.2%	59%	69.5%	83.8%	89%	99%
IMO-ProofBench (ProofBench-Advanced)	4.8%	3.8%	5.2%	18.6%	17.6%	20%	24.8%	37.6%	65.7%	61.9%

Comparison source ↗

This model's scores

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Strengths

Open Apache-2.0 weights at frontier scale (685B MoE), free for research and commercial use and fully self-hostable
Self-verifying proof pipeline: a GRPO-trained verifier plus meta-verifier let the model critique and repair its own proofs before answering
Gold-medal-level results on elite math olympiads (IMO 2025, CMO 2024) and a near-perfect 118/120 on Putnam 2024
Writes rigorous natural-language proofs, not just final answers — built for theorem-proving rather than arithmetic
Beats DeepMind's DeepThink IMO Gold on the IMO-ProofBench Basic subset among compared systems

Best for

Automated theorem proving and verification of multi-step mathematical proofs
Research on self-verification, verifier-reward training, and test-time-compute scaling for reasoning
Generating and grading olympiad-style and competition math solutions
Self-hosted math-reasoning deployments where open Apache-2.0 weights are required
Building proof-checking or math-tutoring tools on top of an open-weight reasoning model

How to access

Provider	Model ID
Hugging Face (open weights) ↗	`deepseek-ai/DeepSeek-Math-V2`

DeepSeek Math — every version

The full lineage of the DeepSeek Math line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
DeepSeek-Math-V2current	2025-11-27	—	Apache-2.0
DeepSeekMath	2024-04	—	Open weights

FAQ

Is DeepSeek-Math-V2 open source?

The weights are published openly on Hugging Face under the Apache 2.0 license, so you can download, run, fine-tune, and use DeepSeek-Math-V2 for research and commercial purposes, including self-hosted deployment. It is run with the DeepSeek-V3.2-Exp inference stack rather than a hosted DeepSeek API endpoint.

How big is DeepSeek-Math-V2?

It is a 685-billion-parameter Mixture-of-Experts model built on the DeepSeek-V3.2-Exp-Base checkpoint, and it works within a 128K-token context. It is text-only and specialized for mathematical proof writing rather than general chat.

What makes DeepSeek-Math-V2 different from other reasoning models?

It is trained to self-verify. DeepSeek trains an LLM verifier (with GRPO) and a meta-verifier that score proofs on rigor and completeness, and the proof generator is rewarded for producing solutions that hold up to that verification — iteratively repairing its own proofs before finalizing an answer.

How well does DeepSeek-Math-V2 perform on math competitions?

With scaled test-time compute it reaches gold-medal-level performance on IMO 2025 and CMO 2024 and scores 118 of 120 on Putnam 2024, above the top human score that year. On IMO-ProofBench it outperforms DeepMind's DeepThink IMO Gold on the Basic subset and stays competitive on the Advanced subset.

// Overview

// Benchmarks

This model's scores

// Strengths

// Best for

// How to access

// DeepSeek Math — every version

// FAQ