Technion · 2026-04-14 · notable
DDTree — Diffusion Draft Trees for Faster Speculative Decoding
DDTree builds a multi-branch draft tree from a single block diffusion pass, then verifies the whole tree in one target-model forward pass. Achieves up to 8.22× speedup over autoregressive decoding and outperforms EAGLE-3 on math benchmarks.
Speculative decoding via diffusion draft trees — up to 8.22× speedup over autoregressive inference, beating EAGLE-3.
Key specs
| GitHub stars | 196 |
|---|---|
| Max speedup | 8.22× (HumanEval, Qwen3-30B-MoE) |
| Math 500 speedup | 7.5× |
| Gsm8 k speedup | 6.6× |
| Vs dflash | 2.13× additional gain |
What is it?
DDTree (Diffusion Draft Tree) accelerates LLM inference using speculative decoding. Instead of a single candidate token sequence per verification round, it builds a full tree of likely continuations from one block diffusion pass, then verifies the entire tree in a single forward pass of the target model. It is an extension of DFlash (block diffusion speculative decoding) that replaces its single-path drafting with tree-based drafting.
How does it work?
The drafter (a small block diffusion model) produces per-position probability distributions over token sequences. DDTree selects branches to explore using a best-first heap algorithm under a fixed node budget, building a tree that maximizes the probability of finding accepted tokens. The target model verifies the full tree simultaneously using an ancestor-only attention mask. The method is lossless: the target model's output distribution is preserved exactly, so generation quality does not change.
Why does it matter?
Getting more tokens per second from large models without changing their outputs is directly valuable for production inference costs. DDTree achieves 8.22× speedup on HumanEval with Qwen3-30B-MoE and outperforms EAGLE-3, a strong autoregressive drafter, on math benchmarks. The implementation is public and the benchmarks are reproducible.
Who is it for?
ML engineers optimizing LLM serving latency and throughput.
Try it
git clone https://github.com/liranringel/ddtree && pip install -r requirements.txt && bash run_benchmark.sh