AI/TLDR

AMAP-ML (Alibaba) · 2026-04-20 · notable

EMF — First Text-Conditioned One-Step Image Generation, Matching 30-Step Quality in 4 Steps

EMF extends MeanFlow one-step image generation to free-form text prompts for the first time. 4-step inference scores GenEval 0.90, matching BLIP3o-NEXT's 30-step 0.91 and outperforming all distilled models. CVPR 2026, #1 on HuggingFace papers today.

Figure 1 from EMF paper: visual comparison between EMF and SANA-Sprint on challenging text prompts using 4-step inference

EMF is the first text-conditioned one-step image generator — matching 30-step diffusion quality in just 4 denoising passes.

Key specs

Gen eval (4 step)0.90
Gen eval (30 step baseline)0.91
Hugging face upvotes85 (#1 today)

What is it?

EMF (Extending MeanFlow to Text-to-Image) is a CVPR 2026 paper from Alibaba's AMAP-ML team that solves a key gap in MeanFlow, a one-step image generation framework. MeanFlow previously worked only with class-label conditioning — EMF makes it work with free-form text prompts for the first time within this paradigm. Code is on GitHub, built on BLIP3o and MeanFlow.

How does it work?

The central finding is that text representations for one-step generation need two properties standard LLM-based encoders lack: high discriminability (strong semantic differentiation between similar texts) and disentanglement (clear separation of distinct semantic components). EMF adapts BLIP3o-NEXT's LLM-based encoder to preserve these properties, then trains a MeanFlow-based generator that reaches GenEval 0.90 in just 4 denoising steps. The 30-step baseline (BLIP3o-NEXT) scores 0.91 — a gap of 0.01. EMF outperforms all distilled models on DPG-Bench and HPS-v2.1 as well.

Why does it matter?

Standard diffusion models need 20–50 denoising steps at inference time. Four-step generation matching 30-step quality is a meaningful step toward real-time image synthesis — useful for interactive tools, video pipelines, and any system where diffusion latency is a bottleneck. AMAP-ML's DCW paper (also CVPR 2026, already in this feed) fixed a training-time bias; EMF pushes the inference efficiency frontier.

Who is it for?

ML researchers working on efficient image generation; teams building interactive or real-time diffusion pipelines.

Try it

git clone https://github.com/AMAP-ML/EMF && pip install -e .

Sources · 3 outlets

Tags

  • diffusion-models
  • one-step-generation
  • meanflow
  • text-to-image
  • cvpr-2026
  • image-generation
  • efficiency
  • blip3o
  • alibaba
  • amap-ml

← All releases · Learn AI