AI/TLDR

Shanghai AI Laboratory · 2026-05-13 · major

SU-01 — Shanghai AI Lab's 31B Open-Weight Reasoner Hits Gold-Medal Scores on IMO 2025 and USAMO 2026

Shanghai AI Laboratory released SU-01, a 31B open-weight (30B-A3B) reasoning model that scores 35 points on both IMO 2025 and USAMO 2026 with test-time scaling, via a reverse-perplexity SFT curriculum and two-stage RL.

SU-01 olympiad reasoning model repository from Simplified-Reasoning

A 31B open-weight model that reaches gold-medal olympiad scores from a documented post-training recipe.

Key specs

LicenseApache 2.0
Parameters31B total, 30B-A3B
Imo 202535 points (gold)
Usamo 202635 points (gold)
Aime 202693.3%

What is it?

SU-01 is an open-weight reasoning model from Shanghai AI Laboratory, built on a Qwen3 mixture-of-experts backbone with 31B total parameters and roughly 3B active per token. It is tuned to solve mathematical and scientific olympiad problems and to write rigorous proofs, not just final answers.

How does it work?

The team starts from a post-trained reasoning backbone and applies a reverse-perplexity curriculum during supervised fine-tuning on about 338K trajectories, instilling proof-search and self-checking behavior. A two-stage reinforcement learning pipeline of around 200 steps moves from verifiable-reward RL to proof-level RL, and test-time scaling runs a generate-verify-revise loop with reasoning traces exceeding 100K tokens.

Why does it matter?

It shows gold-medal olympiad reasoning can come from a compact open model and a written recipe rather than a large closed system. Researchers can download the weights under Apache 2.0 and reproduce or extend the method.

Who is it for?

ML researchers and math-reasoning teams

Try it

Model id Simplified-Reasoning/SU-01 on Hugging Face

Sources · 4 outlets

Tags

  • open-weights
  • reasoning-model
  • mixture-of-experts
  • olympiad-math
  • reinforcement-learning
  • test-time-scaling
  • qwen3
  • proof-generation

← All releases · Learn AI