AI/TLDR

DeepReinforce · 2026-06-25 · major

Ornith 1.0 — open-weight coding models that learn their own RL scaffold

Ornith 1.0 is an MIT-licensed family of agentic coding LLMs (9B, 31B, 35B MoE, 397B MoE) whose RL loop writes its own task-specific scaffold instead of using a fixed human-designed harness.

Ornith 1.0 announcement banner from DeepReinforce

Open-weight agentic coding models that learn to write their own RL scaffold instead of relying on a fixed harness.

Key specs

Variants9B / 31B / 35B-MoE / 397B-MoE
Swe bench verified (397 b)82.4%

Quick facts

MakerDeepReinforce
Variants9B Dense, 31B Dense, 35B MoE, 397B MoE
Base modelsGemma 4 and Qwen 3.5 pretrained checkpoints
LicenseMIT
AvailabilityOpen weights on Hugging Face (including FP8 quantizations)
What's newModel and RL scaffold co-evolve during training
ReleasedJune 25, 2026

Benchmarks

SWE-Bench Verified
Ornith-1.0-397B82.4%
source ↗
SWE-Bench Pro
Ornith-1.0-397B62.2%
source ↗
Terminal-Bench 2.1 (Terminus-2)
Ornith-1.0-397B77.5%
source ↗

What is it?

Ornith 1.0 introduces a self-scaffolding training loop for agentic coding LLMs. DeepReinforce shipped four MIT-licensed variants on Hugging Face — a 9B dense, a 31B dense, a 35B mixture-of-experts, and a 397B MoE flagship — built on Gemma 4 and Qwen 3.5 pretrained checkpoints.

How does it work?

The training loop runs each reinforcement learning step in two stages. First, the model reads a coding task and proposes a refined scaffold for solving it; second, it uses that scaffold to generate a solution rollout. Across steps, the scaffold co-evolves with the model's policy, so Ornith 1.0 effectively learns the harness instead of inheriting a hand-built one.

Why does it matter?

Open coding models usually inherit a fixed human harness, which caps how far they can self-improve on new tasks. Ornith 1.0 ships an open recipe where the scaffold is itself a training target — and the 397B flagship hits 82.4% on SWE-Bench Verified and 62.2% on the harder SWE-Bench Pro, giving open-source teams a strong agentic coding baseline under MIT.

Who is it for?

Open-source agent builders and RL researchers

Frequently asked questions

What does "self-scaffolding" mean in Ornith 1.0?
In Ornith 1.0, each reinforcement learning step runs in two stages: the model first reads a coding task and proposes a refined scaffold, then uses that scaffold to generate a solution rollout. The scaffold co-evolves with the policy, so the model effectively writes the harness that guides its own search instead of relying on a hand-built one.
How does Ornith 1.0 score on SWE-Bench compared to SWE-Bench Pro?
Ornith-1.0-397B reports 82.4% on SWE-Bench Verified and 62.2% on the harder SWE-Bench Pro, per the DeepReinforce announcement. It also reports 77.5% on Terminal-Bench 2.1 (Terminus-2), positioning the flagship as a strong open-source result for agentic coding evaluations.
Which Ornith 1.0 size should I run locally?
DeepReinforce ships Ornith 1.0 in 9B Dense, 31B Dense, 35B MoE, and 397B MoE variants, with FP8 quantizations for the larger ones. The 9B is aimed at edge or single-GPU use; the 35B MoE keeps active parameters small for cheap inference, and the 397B flagship is the top-scoring open variant for serious agent workloads.
Can I use Ornith 1.0 commercially?
Yes — Ornith 1.0 weights are released under the MIT license on Hugging Face, which allows commercial use, modification, and redistribution. DeepReinforce has not yet posted pricing for a hosted API, so for now the only deployment path is self-hosting the open weights through vLLM, SGLang, or a similar serving stack.

Try it

huggingface.co/collections/deepreinforce-ai/ornith-10

Sources · 3 outlets

Tags

  • model
  • open-weights
  • ornith
  • deepreinforce
  • coding-model
  • agentic-coding
  • mixture-of-experts
  • reinforcement-learning
  • self-scaffolding
  • swe-bench
  • terminal-bench
  • mit-license
  • huggingface

← All releases · Learn AI