AI/TLDR

Gemini Diffusion

Google's experimental text-diffusion model that writes ~1,500 tokens/second

Overview

Gemini Diffusion is an experimental research model from Google DeepMind, unveiled at Google I/O on May 20, 2025. It is Google's first language model to use a diffusion approach instead of the autoregressive transformer decoding that powers the rest of the Gemini family. Rather than emitting text one token at a time, Gemini Diffusion learns to turn random noise into coherent text through repeated refinement steps, much like image models such as Imagen and Stable Diffusion do for pixels.

The headline advantage is raw speed. Google's own demo page reports a sampling speed of 1,479 tokens per second (with 0.84 seconds of overhead), and Google describes the model as running roughly 5x faster than Gemini 2.0 Flash-Lite while matching its coding performance. Independent testers measured real generations in the 850-2,000 tokens-per-second range, fast enough that long code snippets and HTML pages appear almost instantly. Because the model denoises whole blocks at once, it can also revise earlier parts of an answer during generation rather than being locked into each token as it goes.

Gemini Diffusion is not a generally available product. At launch it was shared only with trusted testers as a waitlist-gated experimental demo, and Google positioned it as a research preview meant to inform future, faster models. It is text-only, and Google has not published a parameter count, context window, or pricing for it. The diffusion ideas explored here later surfaced in DiffusionGemma, a separate open-weights model Google released in June 2026.

Released2025-05-20
LicenseProprietary (Google, waitlist demo)
WeightsAPI only
ParametersNot disclosed
Max outputNot disclosed
ArchitectureText diffusion model. Instead of predicting one token at a time like an autoregressive transformer, it generates text by starting from noise and refining it over several denoising steps, producing whole blocks of tokens in parallel. This lets it correct mistakes mid-generation and reach very high sampling speeds.
Knowledge cutoffNot disclosed
ModalitiesText
StatusExperimental research demo (waitlist access)

Benchmarks

  1. HumanEval89.6%
  2. MBPP76%
  3. LiveCodeBench (v6)30.9%
  4. BigCodeBench45.4%
  5. LBPP (v2)56.8%
  6. AIME 202523.3%
  7. GPQA Diamond40.4%
  8. Global MMLU (Lite)69.1%
  9. SWE-Bench Verified22.9%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Strengths

  • Extremely fast generation: ~1,479 tokens/second on Google's demo, several times faster than comparable autoregressive models
  • Competitive coding quality despite the speed: nearly matches Gemini 2.0 Flash-Lite on HumanEval, MBPP and BigCodeBench, and edges ahead on LiveCodeBench v6 and AIME 2025
  • Block-parallel denoising lets it self-correct mid-generation instead of committing to each token irreversibly
  • Demonstrates that text diffusion is viable at scale, opening a faster alternative to standard token-by-token decoding

Best for

  • Low-latency code generation and editing where near-instant output matters
  • Interactive demos and prototyping that benefit from very fast first-token-to-finish time
  • Research and experimentation with non-autoregressive (diffusion) text generation
  • Exploring iterative draft-and-refine workflows that exploit the model's mid-generation error correction

Gemini Diffusion / DiffusionGemma — every version

The full lineage of the Gemini Diffusion / DiffusionGemma line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
DiffusionGemmacurrent2026-06-10Apache-2.0
Gemini Diffusion2025-05-20Proprietary

FAQ

What is Gemini Diffusion?

Gemini Diffusion is an experimental research model from Google DeepMind, shown at Google I/O on May 20, 2025. It is Google's first text-generation language model to use a diffusion process: instead of predicting tokens one at a time, it turns random noise into text through repeated refinement steps, generating whole blocks in parallel.

How fast is Gemini Diffusion?

Google's demo page lists a sampling speed of 1,479 tokens per second with 0.84 seconds of overhead, and describes the model as roughly 5x faster than Gemini 2.0 Flash-Lite. Independent testers reported real-world speeds in the 850-2,000 tokens-per-second range.

Can I use Gemini Diffusion through an API?

No. Gemini Diffusion was released only as a waitlist-gated experimental demo for trusted testers, not as a generally available or paid API product. Google has not published pricing, a parameter count, or a context window for it.

How is Gemini Diffusion different from DiffusionGemma?

Gemini Diffusion is the closed, experimental research demo Google showed in May 2025. DiffusionGemma is a separate open-weights diffusion model (a 25.2B-total / 3.8B-active mixture-of-experts) that Google released later, in June 2026, under the Apache 2.0 license with weights on Hugging Face.