How does VibeThinker-3B score on AIME and LiveCodeBench?

VibeThinker-3B posts 94.3 on AIME26 and 91.4 on AIME25 (math contest), plus 80.2 Pass@1 on LiveCodeBench v6 and a 96.1% acceptance rate on recent unseen LeetCode contests. On IMO-AnswerBench VibeThinker-3B scores 76.4. The Weibo team reports it is competitive with much larger frontier models on these specific benchmarks.

What training recipe does VibeThinker-3B use?

The VibeThinker-3B training pipeline strengthens data synthesis, quality filtering, and curriculum learning in supervised fine-tuning, then extends MGPO-style reinforcement learning to multiple verifiable domains while preserving long-context reasoning trajectories. VibeThinker-3B finishes with offline self-distillation and instruction-tuning RL. The Weibo team calls this approach the Spectrum-to-Signal Principle.

Is VibeThinker-3B really competitive with frontier models?

VibeThinker-3B's wins are on verifiable math contests and competitive programming evaluations, not general chat or open-ended tasks. Some researchers have publicly questioned whether contest benchmarks like AIME and LiveCodeBench have leaked into VibeThinker-3B's post-training data. Treat the headline 'matches 200x larger models' framing as benchmark-specific rather than a general capability claim.

Weibo AI · 2026-06-16 · notable

VibeThinker-3B — Weibo's 3B reasoning model hits 80.2% on LiveCodeBench v6

VibeThinker-3B is a 3-billion-parameter dense reasoning model from Sina Weibo's AI lab that posts 94.3 on AIME26 and 80.2 Pass@1 on LiveCodeBench v6, with MIT-licensed weights on HuggingFace and code on GitHub.

VibeThinker-3B model card thumbnail on HuggingFace — HuggingFace

Sina Weibo's 3B model finetuned from Qwen2.5-Coder-3B, MIT-licensed, scoring 94.3 on AIME26 and 80.2 on LiveCodeBench v6.

Key specs

License	MIT
Parameters	3B
GitHub stars	903
Aime26	94.3
Aime25	91.4
Hmmt25	89.3
Bru mo25	93.8
Imo answer bench	76.4
Live code bench v6 pass@1	80.2
Leet code acceptance	96.1%

Quick facts

Maker	Sina Weibo AI Lab
Parameters	3B dense
Base model	Qwen2.5-3B (post-trained from Qwen2.5-Coder-3B)
License	MIT
Weights	HuggingFace, ModelScope
Paper	arXiv 2606.16140

Benchmarks

AIME26

VibeThinker-3B		94.3%

source ↗

LiveCodeBench v6 (Pass@1)

VibeThinker-3B		80.2%

source ↗

What is it?

VibeThinker-3B is a 3-billion-parameter dense reasoning model from Sina Weibo's AI Lab. The model is built on Qwen2.5-3B, post-trained from Qwen2.5-Coder-3B, and released under the MIT license with weights on HuggingFace and ModelScope. VibeThinker-3B targets verifiable reasoning tasks like math contests and competitive programming, not open chat.

How does it work?

The VibeThinker-3B training pipeline starts with supervised fine-tuning that adds new data synthesis, quality filtering, and curriculum learning. The Weibo team then extends MGPO-style reinforcement learning across multiple verifiable domains while keeping full long-context reasoning trajectories intact, and finishes with offline self-distillation plus instruction-tuning RL. They call the overall approach the Spectrum-to-Signal Principle.

Why does it matter?

VibeThinker-3B posts 94.3 on AIME26, 80.2 Pass@1 on LiveCodeBench v6, and a 96.1% acceptance rate on unseen LeetCode contests. Researchers are openly arguing over whether contest benchmarks are now too leaky to trust at this size, but either way VibeThinker-3B is a small, runnable artifact and a useful test-bed for the small-model reasoning argument playing out across labs.

Who is it for?

ML researchers, small-model practitioners

Frequently asked questions

What is VibeThinker-3B?: VibeThinker-3B is a 3-billion-parameter dense reasoning model from Sina Weibo's AI Lab, finetuned from Qwen2.5-Coder-3B. VibeThinker-3B is released under the MIT license with weights on HuggingFace and ModelScope and code on GitHub. The model is designed for verifiable reasoning tasks like math contests and competitive programming.
How does VibeThinker-3B score on AIME and LiveCodeBench?: VibeThinker-3B posts 94.3 on AIME26 and 91.4 on AIME25 (math contest), plus 80.2 Pass@1 on LiveCodeBench v6 and a 96.1% acceptance rate on recent unseen LeetCode contests. On IMO-AnswerBench VibeThinker-3B scores 76.4. The Weibo team reports it is competitive with much larger frontier models on these specific benchmarks.
What training recipe does VibeThinker-3B use?: The VibeThinker-3B training pipeline strengthens data synthesis, quality filtering, and curriculum learning in supervised fine-tuning, then extends MGPO-style reinforcement learning to multiple verifiable domains while preserving long-context reasoning trajectories. VibeThinker-3B finishes with offline self-distillation and instruction-tuning RL. The Weibo team calls this approach the Spectrum-to-Signal Principle.
Is VibeThinker-3B really competitive with frontier models?: VibeThinker-3B's wins are on verifiable math contests and competitive programming evaluations, not general chat or open-ended tasks. Some researchers have publicly questioned whether contest benchmarks like AIME and LiveCodeBench have leaked into VibeThinker-3B's post-training data. Treat the headline 'matches 200x larger models' framing as benchmark-specific rather than a general capability claim.

Try it

https://huggingface.co/WeiboAI/VibeThinker-3B