HiDream-ai · 2026-05-08 · notable
HiDream-O1-Image — 8B Pixel-Level Unified Transformer Open-Sources With #8 Spot on Artificial Analysis Text-to-Image Arena
8B unified transformer for image generation that runs directly on raw pixels — no VAE, no separate text encoder. MIT-licensed base and distilled Dev variants land #8 on the Artificial Analysis Text-to-Image Arena.

HiDream open-sources an 8B pixel-level transformer for image generation that beats larger DiTs and lands #8 on the Artificial Analysis arena.
Key specs
| Parameters | 8B |
|---|---|
| Geneval | 0.90 |
| Dpg bench | 89.83 |
| Hpsv3 | 10.37 |
| Long text bench en | 0.979 |
| Long text bench zh | 0.978 |
| Max resolution | 2048x2048 |
| Aa arena rank | #8 |
What is it?
HiDream-O1-Image is a single-network image generation model that handles text-to-image, instruction-based editing, multi-reference subject personalization, and storyboard generation in one unified architecture. The 8B base model and distilled 9B Dev variant both ship under MIT license. A larger 200B+ Pro variant remains closed.
How does it work?
It uses a Pixel-Level Unified Transformer (UiT) that encodes raw pixels, text, and task-specific conditions in a single shared token space — no VAE and no disjoint text encoder. A separate Reasoning-Driven Prompt Agent built on Gemma-4-31B-it optimizes prompts before generation, resolving layout, subject attributes, and text-rendering details. The full model runs 50 inference steps; the distilled Dev variant runs 28.
Why does it matter?
Open-weight image generation has lived almost entirely inside the diffusion-transformer paradigm, with separate VAE and text encoder stages. A pixel-space unified architecture that wins on GenEval, DPG-Bench, HPSv3, and long-text rendering shows the design space is wider than DiT, and the MIT license keeps the 8B and 9B variants available for fine-tuning and downstream training.
Who is it for?
image-gen researchers, indie creators, anyone running image gen locally
Try it
huggingface.co/spaces/HiDream-ai/HiDream-O1-Image