Manling Li et al. · 2026-04-07 · notable
RAGEN-2: Reasoning Collapse in Agentic RL
Identifies 'template collapse' in multi-turn RL-trained agents and proposes an SNR-aware prompt filter that lifts reasoning across planning, math, web nav, and code.
A diagnosis and a fix for a failure mode in RL-trained agents: models that look diverse but are actually copy-pasting a template.
What is it?
RAGEN-2 is the follow-up paper from the original RAGEN team — a group including researchers from Stanford, Northwestern, Microsoft and the University of Washington, led by Manling Li. The paper studies why reinforcement-learning training of multi-turn agents often destabilises, and names a specific failure mode they call 'template collapse.' In template collapse, the model's outputs look diverse by entropy but they no longer respond to different inputs — the agent is effectively regurgitating a template.
How does it work?
The authors argue that entropy alone cannot detect template collapse. They decompose reasoning quality into within-input diversity (entropy) and cross-input distinguishability (mutual information) and show that mutual information correlates with downstream performance far better than entropy does. They then explain template collapse via a signal-to-noise-ratio mechanism and propose 'SNR-aware filtering' — selecting prompts for RL training based on reward variance. The fix improves performance across planning, math reasoning, web navigation and code execution.
Why does it matter?
Agentic RL is the approach everyone is using to train the next generation of tool-using models, and yet nobody understands exactly when it breaks. RAGEN-2 gives you a concrete diagnostic (mutual information, not entropy) and a concrete fix (SNR-aware filtering) that you can drop into existing RL pipelines.
Who is it for?
Anyone training or debugging RL-finetuned agent models.