Gemini 2.0 Flash

Name: Gemini 2.0 Flash
Author: Google

Google's fast, cheap multimodal workhorse with a 1M-token window — now retired.

Overview

Gemini 2.0 Flash was Google DeepMind's second-generation "workhorse" model, positioned as the fast, low-cost member of the Gemini 2.0 family alongside the lighter Flash-Lite and the larger Pro. Google first shipped it as an experimental model on December 11, 2024, then made it generally available through the Gemini API in Google AI Studio and Vertex AI on February 5, 2025. It was designed for the "agentic era," pairing low latency and a low price with native tool use and a 1-million-token context window.

Gemini 2.0 Flash accepted text, image, audio, and video as input and returned text. Google described image-generation and text-to-speech output, plus the Multimodal Live API, as features arriving after the initial release rather than at launch. Its knowledge cutoff is August 2024, it can read up to 1,048,576 tokens in a single request, and it can generate up to 8,192 output tokens. Unlike Gemini 1.5 Flash, it dropped the separate short- versus long-context pricing in favor of a single flat per-token rate.

Gemini 2.0 Flash is now retired: Google deprecated the gemini-2.0-flash model and shut it down on June 1, 2026, directing developers to newer Gemini Flash generations. This page documents it as a historical reference. Note that Google never published a parameter count or detailed architecture for the model, and it released few official benchmark numbers at launch — most of the quantitative figures below come from the model card Google published in April 2025.

Released	2025-02-05
License	Proprietary (Google) — closed source, served only via API
Weights	API only
Parameters	Undisclosed — Google never published the parameter count
Context	1,048,576 tokens (1M)
Max output	8,192 tokens
Architecture	Undisclosed proprietary sparse mixture-of-experts transformer; Google did not publish architectural details, layer counts, or parameter counts for Gemini 2.0 Flash.
Knowledge cutoff	August 2024
Modalities	text input, image input, audio input, video input, text output
Status	Retired — deprecated and shut down on June 1, 2026. Originally launched as an experimental model on December 11, 2024 and reached general availability on February 5, 2025.

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.10 / 1M tokens (text, image, or video); $0.70 / 1M tokens for audio per 1M tokens
Cached input	$0.025 / 1M tokens (text/image/video context caching); $0.175 / 1M tokens (audio) per 1M tokens
Output	$0.40 / 1M tokens per 1M tokens

Standard paid tier. A batch tier was half price ($0.05 input / $0.20 output). The free developer tier also covered this model. Pricing was retired when the model shut down on June 1, 2026.

Pricing source ↗

Strengths

Very large 1-million-token context window — enough to hold an entire codebase, a book-length document, or hours of transcript in one request
Low price relative to capability, with a single flat per-token rate replacing Gemini 1.5 Flash's split short/long-context pricing
Multimodal input across text, image, audio, and video through one API endpoint
Native tool use / function calling, built for agentic workflows
Strong document-grounded factuality (84.6% FACTS Grounding) and solid math (89.7% MATH)
Available on Google's genuinely free developer tier, making it a practical default for prototyping

Best for

High-volume, cost-sensitive text generation, summarization, and classification
Long-document analysis — querying contracts, research papers, or full codebases in a single call
Multimodal apps that mix images, audio, or video with text prompts
Agentic and tool-using workflows via function calling
Real-time, low-latency chat assistants where response speed matters
RAG pipelines needing strong document-grounded factuality

How to access

Provider	Model ID
Google AI Studio / Gemini Developer API ↗	`gemini-2.0-flash`
Google Cloud Vertex AI ↗	`gemini-2.0-flash-001`

Gemini Flash — every version

The full lineage of the Gemini Flash line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Gemini 3.5 Flashcurrent	2026-05-19	—	Proprietary
Gemini 3 Flash	2025-12-17	—	Proprietary
Gemini 2.5 Flash	2025-04-17	—	Proprietary
Gemini 2.0 Flash	2025-01-30	—	Proprietary
Gemini 1.5 Flash	2024-05-14	—	Proprietary

FAQ

Is Gemini 2.0 Flash still available?

No. Google deprecated the gemini-2.0-flash model and shut it down on June 1, 2026. It can no longer be called through the Gemini API or Vertex AI; developers were directed to newer Gemini Flash generations. This page is a historical reference.

What was Gemini 2.0 Flash's context window and output limit?

It accepted up to 1,048,576 input tokens (1 million) in a single request and could generate up to 8,192 output tokens. Its knowledge cutoff was August 2024.

How much did Gemini 2.0 Flash cost?

On the standard paid tier it was $0.10 per million input tokens (text, image, or video; $0.70 for audio) and $0.40 per million output tokens, per Google's official pricing page. A batch tier ran at half those rates, and the model was also available on Google's free developer tier.

What could Gemini 2.0 Flash do, and what were its strengths?

It was a fast, low-cost multimodal model that accepted text, image, audio, and video input and produced text output, with native tool use for agentic workflows. Its standout traits were a 1-million-token context window, strong document-grounded factuality (84.6% on FACTS Grounding), and solid math (89.7% on MATH), all at a low price point.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// Gemini Flash — every version

// FAQ