AI/TLDR

Gemini 2.0 Flash

Google's fast, cheap multimodal workhorse with a 1M-token window — now retired.

Overview

Gemini 2.0 Flash was Google DeepMind's second-generation "workhorse" model, positioned as the fast, low-cost member of the Gemini 2.0 family alongside the lighter Flash-Lite and the larger Pro. Google first shipped it as an experimental model on December 11, 2024, then made it generally available through the Gemini API in Google AI Studio and Vertex AI on February 5, 2025. It was designed for the "agentic era," pairing low latency and a low price with native tool use and a 1-million-token context window.

Gemini 2.0 Flash accepted text, image, audio, and video as input and returned text. Google described image-generation and text-to-speech output, plus the Multimodal Live API, as features arriving after the initial release rather than at launch. Its knowledge cutoff is August 2024, it can read up to 1,048,576 tokens in a single request, and it can generate up to 8,192 output tokens. Unlike Gemini 1.5 Flash, it dropped the separate short- versus long-context pricing in favor of a single flat per-token rate.

Gemini 2.0 Flash is now retired: Google deprecated the gemini-2.0-flash model and shut it down on June 1, 2026, directing developers to newer Gemini Flash generations. This page documents it as a historical reference. Note that Google never published a parameter count or detailed architecture for the model, and it released few official benchmark numbers at launch — most of the quantitative figures below come from the model card Google published in April 2025.

Released2025-02-05
LicenseProprietary (Google) — closed source, served only via API
WeightsAPI only
ParametersUndisclosed — Google never published the parameter count
Context1,048,576 tokens (1M)
Max output8,192 tokens
ArchitectureUndisclosed proprietary sparse mixture-of-experts transformer; Google did not publish architectural details, layer counts, or parameter counts for Gemini 2.0 Flash.
Knowledge cutoffAugust 2024
Modalitiestext input, image input, audio input, video input, text output
StatusRetired — deprecated and shut down on June 1, 2026. Originally launched as an experimental model on December 11, 2024 and reached general availability on February 5, 2025.

Benchmarks

  1. MMLU-Pro76.4%
  2. GPQA Diamond62.1%
  3. MATH89.7%
  4. MMMU (multimodal reasoning)70.7%
  5. Natural2Code92.9%
  6. FACTS Grounding84.6%
  7. Global MMLU (Lite)83.4%
  8. SimpleQA (no search)29.9%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.10 / 1M tokens (text, image, or video); $0.70 / 1M tokens for audio per 1M tokens
Cached input$0.025 / 1M tokens (text/image/video context caching); $0.175 / 1M tokens (audio) per 1M tokens
Output$0.40 / 1M tokens per 1M tokens

Standard paid tier. A batch tier was half price ($0.05 input / $0.20 output). The free developer tier also covered this model. Pricing was retired when the model shut down on June 1, 2026.

Pricing source ↗

Strengths

  • Very large 1-million-token context window — enough to hold an entire codebase, a book-length document, or hours of transcript in one request
  • Low price relative to capability, with a single flat per-token rate replacing Gemini 1.5 Flash's split short/long-context pricing
  • Multimodal input across text, image, audio, and video through one API endpoint
  • Native tool use / function calling, built for agentic workflows
  • Strong document-grounded factuality (84.6% FACTS Grounding) and solid math (89.7% MATH)
  • Available on Google's genuinely free developer tier, making it a practical default for prototyping

Best for

  • High-volume, cost-sensitive text generation, summarization, and classification
  • Long-document analysis — querying contracts, research papers, or full codebases in a single call
  • Multimodal apps that mix images, audio, or video with text prompts
  • Agentic and tool-using workflows via function calling
  • Real-time, low-latency chat assistants where response speed matters
  • RAG pipelines needing strong document-grounded factuality

How to access

ProviderModel ID
Google AI Studio / Gemini Developer API ↗gemini-2.0-flash
Google Cloud Vertex AI ↗gemini-2.0-flash-001

Gemini Flash — every version

The full lineage of the Gemini Flash line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Gemini 3.5 Flashcurrent2026-05-19Proprietary
Gemini 3 Flash2025-12-17Proprietary
Gemini 2.5 Flash2025-04-17Proprietary
Gemini 2.0 Flash2025-01-30Proprietary
Gemini 1.5 Flash2024-05-14Proprietary

FAQ

Is Gemini 2.0 Flash still available?

No. Google deprecated the gemini-2.0-flash model and shut it down on June 1, 2026. It can no longer be called through the Gemini API or Vertex AI; developers were directed to newer Gemini Flash generations. This page is a historical reference.

What was Gemini 2.0 Flash's context window and output limit?

It accepted up to 1,048,576 input tokens (1 million) in a single request and could generate up to 8,192 output tokens. Its knowledge cutoff was August 2024.

How much did Gemini 2.0 Flash cost?

On the standard paid tier it was $0.10 per million input tokens (text, image, or video; $0.70 for audio) and $0.40 per million output tokens, per Google's official pricing page. A batch tier ran at half those rates, and the model was also available on Google's free developer tier.

What could Gemini 2.0 Flash do, and what were its strengths?

It was a fast, low-cost multimodal model that accepted text, image, audio, and video input and produced text output, with native tool use for agentic workflows. Its standout traits were a 1-million-token context window, strong document-grounded factuality (84.6% on FACTS Grounding), and solid math (89.7% on MATH), all at a low price point.