DeepReinforce · 2026-06-25 · major
Ornith 1.0 — open-weight coding models that learn their own RL scaffold
Ornith 1.0 is an MIT-licensed family of agentic coding LLMs (9B, 31B, 35B MoE, 397B MoE) whose RL loop writes its own task-specific scaffold instead of using a fixed human-designed harness.

Open-weight agentic coding models that learn to write their own RL scaffold instead of relying on a fixed harness.
Key specs
| Variants | 9B / 31B / 35B-MoE / 397B-MoE |
|---|---|
| Swe bench verified (397 b) | 82.4% |
Quick facts
| Maker | DeepReinforce |
|---|---|
| Variants | 9B Dense, 31B Dense, 35B MoE, 397B MoE |
| Base models | Gemma 4 and Qwen 3.5 pretrained checkpoints |
| License | MIT |
| Availability | Open weights on Hugging Face (including FP8 quantizations) |
| What's new | Model and RL scaffold co-evolve during training |
| Released | June 25, 2026 |
Benchmarks
What is it?
Ornith 1.0 introduces a self-scaffolding training loop for agentic coding LLMs. DeepReinforce shipped four MIT-licensed variants on Hugging Face — a 9B dense, a 31B dense, a 35B mixture-of-experts, and a 397B MoE flagship — built on Gemma 4 and Qwen 3.5 pretrained checkpoints.
How does it work?
The training loop runs each reinforcement learning step in two stages. First, the model reads a coding task and proposes a refined scaffold for solving it; second, it uses that scaffold to generate a solution rollout. Across steps, the scaffold co-evolves with the model's policy, so Ornith 1.0 effectively learns the harness instead of inheriting a hand-built one.
Why does it matter?
Open coding models usually inherit a fixed human harness, which caps how far they can self-improve on new tasks. Ornith 1.0 ships an open recipe where the scaffold is itself a training target — and the 397B flagship hits 82.4% on SWE-Bench Verified and 62.2% on the harder SWE-Bench Pro, giving open-source teams a strong agentic coding baseline under MIT.
Who is it for?
Open-source agent builders and RL researchers
Frequently asked questions
- What does "self-scaffolding" mean in Ornith 1.0?
- In Ornith 1.0, each reinforcement learning step runs in two stages: the model first reads a coding task and proposes a refined scaffold, then uses that scaffold to generate a solution rollout. The scaffold co-evolves with the policy, so the model effectively writes the harness that guides its own search instead of relying on a hand-built one.
- How does Ornith 1.0 score on SWE-Bench compared to SWE-Bench Pro?
- Ornith-1.0-397B reports 82.4% on SWE-Bench Verified and 62.2% on the harder SWE-Bench Pro, per the DeepReinforce announcement. It also reports 77.5% on Terminal-Bench 2.1 (Terminus-2), positioning the flagship as a strong open-source result for agentic coding evaluations.
- Which Ornith 1.0 size should I run locally?
- DeepReinforce ships Ornith 1.0 in 9B Dense, 31B Dense, 35B MoE, and 397B MoE variants, with FP8 quantizations for the larger ones. The 9B is aimed at edge or single-GPU use; the 35B MoE keeps active parameters small for cheap inference, and the 397B flagship is the top-scoring open variant for serious agent workloads.
- Can I use Ornith 1.0 commercially?
- Yes — Ornith 1.0 weights are released under the MIT license on Hugging Face, which allows commercial use, modification, and redistribution. DeepReinforce has not yet posted pricing for a hosted API, so for now the only deployment path is self-hosting the open weights through vLLM, SGLang, or a similar serving stack.
Try it
huggingface.co/collections/deepreinforce-ai/ornith-10