Armin Ronacher · 2026-07-04 · notable

Armin Ronacher — Better Models: Worse Tools

Armin Ronacher shows Claude Opus 4.8 and Claude Sonnet 5 emit malformed tool calls on schemas they weren't RL-trained against — appending invented JSON keys that Claude Code silently repairs but other harnesses reject.

Armin Ronacher's social card for Better Models: Worse Tools

Anthropic's newest models produce invalid tool calls outside Claude Code — because the RL harness that trained them fixed the mistakes for free.

What is it?

Better Models: Worse Tools is Armin Ronacher's July 4, 2026 post on a regression he keeps hitting inside Pi's edit tool: Claude Opus 4.8 and Claude Sonnet 5 produce a correct edit call, then append extra JSON fields that were never in the schema. Older Claude models, and models from other labs, do not do this.

How does it work?

The post traces the failure to reinforcement learning on Claude Code's harness, which quietly repairs sloppy tool calls before the model sees the error. Trained against that forgiving loop, the newer models learned that extra keys are harmless — a habit that breaks any harness with strict JSON parsing, including Pi and the OpenAI Harmony framework.

Why does it matter?

Anyone building agents on Anthropic models but not inside Claude Code will hit these silent failures. Armin Ronacher's practical fix — enable LARK-style grammar-constrained decoding so the sampler cannot emit fields outside the schema — is one of the more actionable recommendations to come out of the current tool-use debate.

Armin Ronacher — Better Models: Worse Tools

What is it?

How does it work?

Why does it matter?

Sources

Tags