Shanghai AI Laboratory · 2026-05-13 · major
SU-01 — Shanghai AI Lab's 31B Open-Weight Reasoner Hits Gold-Medal Scores on IMO 2025 and USAMO 2026
Shanghai AI Laboratory released SU-01, a 31B open-weight (30B-A3B) reasoning model that scores 35 points on both IMO 2025 and USAMO 2026 with test-time scaling, via a reverse-perplexity SFT curriculum and two-stage RL.
A 31B open-weight model that reaches gold-medal olympiad scores from a documented post-training recipe.
Key specs
| License | Apache 2.0 |
|---|---|
| Parameters | 31B total, 30B-A3B |
| Imo 2025 | 35 points (gold) |
| Usamo 2026 | 35 points (gold) |
| Aime 2026 | 93.3% |
What is it?
SU-01 is an open-weight reasoning model from Shanghai AI Laboratory, built on a Qwen3 mixture-of-experts backbone with 31B total parameters and roughly 3B active per token. It is tuned to solve mathematical and scientific olympiad problems and to write rigorous proofs, not just final answers.
How does it work?
The team starts from a post-trained reasoning backbone and applies a reverse-perplexity curriculum during supervised fine-tuning on about 338K trajectories, instilling proof-search and self-checking behavior. A two-stage reinforcement learning pipeline of around 200 steps moves from verifiable-reward RL to proof-level RL, and test-time scaling runs a generate-verify-revise loop with reasoning traces exceeding 100K tokens.
Why does it matter?
It shows gold-medal olympiad reasoning can come from a compact open model and a written recipe rather than a large closed system. Researchers can download the weights under Apache 2.0 and reproduce or extend the method.
Who is it for?
ML researchers and math-reasoning teams
Try it
Model id Simplified-Reasoning/SU-01 on Hugging Face