DeepSeek-Prover-V1.5

Name: DeepSeek-Prover-V1.5
Author: DeepSeek

A 7B open-weight Lean 4 theorem-proving model that pairs proof-assistant-feedback RL with Monte-Carlo tree search to search for and write formal proofs.

Overview

DeepSeek-Prover-V1.5 is a 7-billion-parameter open-weight language model from DeepSeek, built for automated theorem proving in the Lean 4 proof assistant. Introduced in a paper posted to arXiv on August 15, 2024 (2408.08152), it is the direct successor to DeepSeek-Prover-V1 and improves on it across both training and inference. The model is pre-trained from DeepSeekMath-Base 7B with extra specialisation on formal mathematics, then supervised-fine-tuned and refined with reinforcement learning. DeepSeek released three checkpoints — DeepSeek-Prover-V1.5-Base, DeepSeek-Prover-V1.5-SFT, and DeepSeek-Prover-V1.5-RL — all on Hugging Face.

Its two headline ideas are RLPAF and RMaxTS. RLPAF (reinforcement learning from proof assistant feedback) uses the Lean verifier itself as the reward: a proof that type-checks is a success signal, so the model learns directly from the proof assistant rather than from human preference labels. RMaxTS is a variant of Monte-Carlo tree search with an intrinsic-reward-driven exploration strategy; instead of generating a whole proof in a single pass, the model expands a search tree of partial proofs using a truncate-and-resume mechanism, which produces more diverse proof attempts and raises the chance of finding one that closes the goal.

On the standard formal-proving benchmarks, DeepSeek-Prover-V1.5-RL with RMaxTS reaches 63.5% on the high-school-level miniF2F test set and 25.3% on the undergraduate-level ProofNet — state-of-the-art for open-source provers at the time, and a large jump over DeepSeek-Prover-V1 (which scored 50.0% on miniF2F). It was later superseded by DeepSeek-Prover-V2 in April 2025, but the V1.5 weights remain freely downloadable for research and self-hosting.

Released	2024-08
License	Code is released under the MIT License. The model weights are released under the DeepSeek Model License, which permits commercial use. The model is derived from DeepSeekMath-Base 7B.
Weights	Open weights
Parameters	7B (dense; built on DeepSeekMath-Base 7B)
Context	4,096 tokens (maximum sequence length used in supervised fine-tuning and proof generation)
Max output	Not separately specified; generations are bounded by the 4,096-token context. Proofs are produced incrementally via a truncate-and-resume mechanism during tree search.
Architecture	Dense decoder-only transformer (7B), pre-trained from DeepSeekMath-Base 7B with additional specialisation on formal mathematical languages. Three checkpoints were released: Base (continued pretraining), SFT (supervised fine-tuning on an enhanced formal-proof dataset derived from DeepSeek-Prover-V1, with chain-of-thought and Lean 4 tactic-state conditioning), and RL (further refined with RLPAF — reinforcement learning from proof assistant feedback, using Lean's verifier as a reward signal). At inference it uses RMaxTS, an intrinsic-reward-driven variant of Monte-Carlo tree search that explores diverse proof paths via a truncate-and-resume mechanism, alongside a hybrid whole-proof / proof-step generation framework.
Knowledge cutoff	Not separately disclosed by DeepSeek. The model is specialised on formal-math corpora (Lean Mathlib4, Lean Workbook, and synthetic proof data) on top of DeepSeekMath-Base, which was pretrained on data up to roughly 2023-2024.
Modalities	Text
Status	Available (open weights on Hugging Face); superseded by DeepSeek-Prover-V2 (April 2025) but not formally deprecated.

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Strengths

State-of-the-art open-source results at release: 63.5% on miniF2F-test and 25.3% on ProofNet with RMaxTS tree search, well ahead of DeepSeek-Prover-V1 (50.0% on miniF2F)
Learns directly from the Lean 4 verifier via RLPAF (reinforcement learning from proof assistant feedback) — the reward is whether a generated proof actually type-checks, not a human preference label
RMaxTS Monte-Carlo tree search with a truncate-and-resume mechanism generates diverse proof paths and improves discovery, with gains that stay stable across a wide range of sampling budgets
Hybrid whole-proof and proof-step generation with explicit Lean 4 tactic-state conditioning, combining the speed of single-pass proofs with the robustness of step-by-step search
Compact 7B dense model (built on DeepSeekMath-Base 7B) that runs on a single GPU, with Base, SFT, and RL checkpoints all openly published
Open weights under the DeepSeek Model License (commercial use permitted) with MIT-licensed code, enabling self-hosting, fine-tuning, and reproducible research

Best for

Automated theorem proving and proof search in the Lean 4 proof assistant
Formal verification of mathematics — closing goals in Mathlib4-style developments and competition-math formalisations (miniF2F / ProofNet)
Research on neural theorem proving, RL from verifier feedback, and Monte-Carlo tree search for proof generation
Generating candidate Lean proofs and tactic suggestions to assist human formalisation work
A reproducible open-weight baseline for building and benchmarking new formal-reasoning systems
Self-hosted formal-math experimentation and further fine-tuning under the open license

How to access

Provider	Model ID
Hugging Face (self-host) — RL checkpoint ↗	`deepseek-ai/DeepSeek-Prover-V1.5-RL`
Hugging Face (self-host) — SFT checkpoint ↗	`deepseek-ai/DeepSeek-Prover-V1.5-SFT`
Hugging Face (self-host) — Base checkpoint ↗	`deepseek-ai/DeepSeek-Prover-V1.5-Base`

DeepSeek Prover — every version

The full lineage of the DeepSeek Prover line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
DeepSeek-Prover-V2current	2025-04-30	—	Open weights
DeepSeek-Prover-V1.5	2024-08	—	Open weights

FAQ

What is DeepSeek-Prover-V1.5?

It is a 7-billion-parameter open-weight language model from DeepSeek, released in August 2024, built to prove mathematical theorems in the Lean 4 proof assistant. It improves on DeepSeek-Prover-V1 by adding reinforcement learning from proof assistant feedback (RLPAF) and a Monte-Carlo tree search method called RMaxTS, and it was published with Base, SFT, and RL checkpoints on Hugging Face.

What are RLPAF and RMaxTS?

RLPAF stands for reinforcement learning from proof assistant feedback: instead of using human preference labels, DeepSeek uses Lean's own verifier as the reward signal, so the model is rewarded when its generated proof actually type-checks. RMaxTS is a variant of Monte-Carlo tree search with an intrinsic-reward-driven exploration strategy that builds a tree of partial proofs (via a truncate-and-resume mechanism) to find diverse, successful proof paths rather than generating a whole proof in one shot.

How well does DeepSeek-Prover-V1.5 perform?

With RMaxTS tree search, the RL checkpoint reaches 63.5% on the high-school-level miniF2F test set and 25.3% on the undergraduate-level ProofNet benchmark — state-of-the-art for open-source theorem provers at the time and a large jump over DeepSeek-Prover-V1's 50.0% on miniF2F. In single-pass whole-proof mode with a budget of 128 attempts it proves 51.6% of miniF2F problems.

Is DeepSeek-Prover-V1.5 still the latest version?

No. DeepSeek released DeepSeek-Prover-V2 in April 2025, which supersedes it. V1.5 has not been formally deprecated, however, and its open weights (Base, SFT, and RL) remain freely available on Hugging Face for research and self-hosting.

// Overview

// Benchmarks

// Strengths

// Best for

// How to access

// DeepSeek Prover — every version

// FAQ