Overview
DeepSeekMath is a 7-billion-parameter open-weight math model from DeepSeek, first released in February 2024 (with the v3 paper revision following in April 2024). Rather than training from scratch, DeepSeek continued pre-training its DeepSeek-Coder-Base-v1.5 7B checkpoint on 120 billion math-related tokens scraped from Common Crawl, plus natural-language and code data. The result was a small model whose mathematical reasoning rivaled far larger systems: DeepSeekMath approached the MATH-benchmark level of Gemini-Ultra and GPT-4 while being open and self-hostable.
DeepSeek shipped three variants: DeepSeekMath-Base 7B (the continued-pretrained foundation), DeepSeekMath-Instruct 7B (chain-of-thought instruction-tuned), and DeepSeekMath-RL 7B (the strongest, refined with reinforcement learning). The headline result — 51.7% on the competition-level MATH benchmark and 88.2% on GSM8K using chain-of-thought without any external tools — came from the RL variant and beat every open-source model from 7B to 70B at the time. All variants run in a 4,096-token context.
DeepSeekMath's most lasting contribution is GRPO (Group Relative Policy Optimization), the reinforcement-learning algorithm introduced in this paper. GRPO is a memory-efficient variant of PPO that drops the separate value/critic model and instead estimates the baseline from a group of sampled outputs. It is the same method DeepSeek later scaled up to train DeepSeek-R1, making DeepSeekMath a direct technical ancestor of DeepSeek's reasoning models. The line continues today with the much larger DeepSeek-Math-V2.
| Released | 2024-04 |
|---|---|
| License | MIT (code) + DeepSeek Model License (weights); commercial use permitted |
| Weights | Open weights |
| Parameters | 7B (dense) |
| Context | 4K (4,096 tokens) |
| Architecture | Dense decoder-only transformer, continued from DeepSeek-Coder-Base-v1.5 7B on 120B math tokens; RL variant trained with GRPO |
| Modalities | Text |
| Status | Superseded — the current flagship of the DeepSeek Math line is DeepSeek-Math-V2 (Nov 2025). The original 7B weights remain available on Hugging Face. |
Benchmarks
- MATH (chain-of-thought, no tools) — DeepSeekMath-RL 7B51.7%
- MATH (self-consistency over 64 samples) — DeepSeekMath 7B60.9%
- GSM8K (chain-of-thought) — DeepSeekMath-RL 7B88.2%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Strengths
- Introduced GRPO, the group-relative RL method DeepSeek later reused to train DeepSeek-R1
- Strong math reasoning from a small 7B model — 51.7% on MATH and 88.2% on GSM8K without external tools
- Continued pre-training on 120B math tokens from public web data, showing high-quality data engineering over raw scale
- Fully open weights with commercial-use license, in three variants (Base, Instruct, RL) for self-hosting and fine-tuning
- Approaches Gemini-Ultra / GPT-4-level MATH accuracy at a fraction of the parameter count
Best for
- Self-hosted math problem solving and step-by-step (chain-of-thought) reasoning
- Research baseline for GRPO and reinforcement-learning-for-reasoning experiments
- Fine-tuning a small, efficient math foundation model for tutoring or STEM assistants
- Studying math-focused continued pre-training and data curation from web corpora
- Running competition-style math (GSM8K / MATH) evaluation on commodity hardware
How to access
| Provider | Model ID |
|---|---|
| Hugging Face (open weights — RL variant) ↗ | deepseek-ai/deepseek-math-7b-rl |
| Hugging Face (open weights — Instruct variant) ↗ | deepseek-ai/deepseek-math-7b-instruct |
| Hugging Face (open weights — Base variant) ↗ | deepseek-ai/deepseek-math-7b-base |
DeepSeek Math — every version
The full lineage of the DeepSeek Math line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| DeepSeek-Math-V2current | 2025-11-27 | — | Apache-2.0 |
| DeepSeekMath | 2024-04 | — | Open weights |
FAQ
Is DeepSeekMath open source and free to use?
Yes. The weights are published on Hugging Face in three variants (Base, Instruct, and RL). The code is MIT-licensed and the model weights are covered by the DeepSeek Model License, which permits commercial use. You can download, run, and fine-tune the model yourself.
What is GRPO, and why does DeepSeekMath matter?
GRPO (Group Relative Policy Optimization) is the reinforcement-learning algorithm introduced in the DeepSeekMath paper. It is a memory-efficient variant of PPO that removes the separate critic model and instead computes a baseline from a group of sampled answers. DeepSeek later used GRPO to train DeepSeek-R1, so DeepSeekMath is a direct technical ancestor of its reasoning models.
How well does DeepSeekMath perform on math benchmarks?
The DeepSeekMath-RL 7B variant scores 51.7% on the competition-level MATH benchmark and 88.2% on GSM8K using chain-of-thought without external tools. With self-consistency over 64 samples, the model reaches 60.9% on MATH — approaching the MATH accuracy of Gemini-Ultra and GPT-4 despite being only 7B parameters.
Is DeepSeekMath still the latest model in its line?
No. DeepSeekMath (2024) is the original 7B model. DeepSeek released DeepSeek-Math-V2 in November 2025 — a much larger 685B open-weight self-verifying prover. The original 7B DeepSeekMath weights are still available, but V2 is the current flagship of the DeepSeek Math line.