Shanghai AI Laboratory · 2026-05-13 · major

SU-01 — Shanghai AI Lab's 31B Open-Weight Reasoner Hits Gold-Medal Scores on IMO 2025 and USAMO 2026

Item: SU-01 — Shanghai AI Lab's 31B Open-Weight Reasoner Hits Gold-Medal Scores on IMO 2025 and USAMO 2026
Rating: 4
Author: AI/TLDR

Shanghai AI Laboratory released SU-01, a 31B open-weight (30B-A3B) reasoning model that scores 35 points on both IMO 2025 and USAMO 2026 with test-time scaling, via a reverse-perplexity SFT curriculum and two-stage RL.

SU-01 olympiad reasoning model repository from Simplified-Reasoning

A 31B open-weight model that reaches gold-medal olympiad scores from a documented post-training recipe.

Key specs

License	Apache 2.0
Parameters	31B total, 30B-A3B
Imo 2025	35 points (gold)
Usamo 2026	35 points (gold)
Aime 2026	93.3%

What is it?

SU-01 is an open-weight reasoning model from Shanghai AI Laboratory, built on a Qwen3 mixture-of-experts backbone with 31B total parameters and roughly 3B active per token. It is tuned to solve mathematical and scientific olympiad problems and to write rigorous proofs, not just final answers.

How does it work?

The team starts from a post-trained reasoning backbone and applies a reverse-perplexity curriculum during supervised fine-tuning on about 338K trajectories, instilling proof-search and self-checking behavior. A two-stage reinforcement learning pipeline of around 200 steps moves from verifiable-reward RL to proof-level RL, and test-time scaling runs a generate-verify-revise loop with reasoning traces exceeding 100K tokens.

Why does it matter?

It shows gold-medal olympiad reasoning can come from a compact open model and a written recipe rather than a large closed system. Researchers can download the weights under Apache 2.0 and reproduce or extend the method.

Who is it for?

ML researchers and math-reasoning teams

Try it

Model id Simplified-Reasoning/SU-01 on Hugging Face