Gemini 1.5 Flash

Name: Gemini 1.5 Flash
Author: Google

Google's first speed-and-cost-optimized Gemini, with a 1M-token multimodal context window.

Overview

Gemini 1.5 Flash is the lightweight, speed-and-cost-optimized member of Google's Gemini 1.5 family, announced at Google I/O on May 14, 2024 and made generally available in June 2024. Where Gemini 1.5 Pro was the heavyweight flagship, Gemini 1.5 Flash was Google's answer to demand for a cheaper, faster model that still kept the headline feature of the 1.5 generation: a 1-million-token context window with native multimodal input across text, images, audio and video.

Technically, Gemini 1.5 Flash is a transformer decoder model that Google online-distilled from Gemini 1.5 Pro, so it shares the same long-context and multimodal capabilities but runs with much lower latency and serving cost. Google positioned it for high-volume, high-frequency work such as summarization, chat, captioning, and data extraction from long documents, rather than for the hardest reasoning tasks. In August 2024 Google cut its price sharply, to $0.075 per million input tokens and $0.30 per million output tokens for prompts under 128K tokens, making it one of the cheapest capable models of its era.

Gemini 1.5 Flash is now retired. Google deprecated the Gemini 1.5 line and shut the models down in 2025: gemini-1.5-flash-001 was discontinued on May 27, 2025, and the gemini-1.5-flash / gemini-1.5-flash-002 alias was shut down around September 24-29, 2025. Developers were directed to migrate to Gemini 2.0 Flash (and later Gemini 2.5 Flash). It is documented here as a historical reference point in the Gemini Flash line.

Released	2024-05-14
License	Proprietary
Weights	API only
Parameters	Not disclosed
Context	1,048,576 tokens (1M)
Max output	8,192 tokens
Architecture	Transformer decoder model, online-distilled from the larger Gemini 1.5 Pro. It inherits Gemini 1.5 Pro's long-context and natively multimodal design (text, image, audio, video interleaved as input) but is tuned for lower latency and cheaper, higher-throughput serving.
Knowledge cutoff	November 2023
Modalities	Text, Image, Audio, Video
Status	Retired. Gemini 1.5 Flash is deprecated and shut down on the Gemini API: gemini-1.5-flash-001 was discontinued May 27, 2025, and the gemini-1.5-flash / gemini-1.5-flash-002 alias was shut down around September 24-29, 2025. Google recommends migrating to Gemini 2.0 Flash or a later Gemini Flash model.

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.075 per 1M tokens (prompts ≤128K tokens) per 1M tokens
Output	$0.30 per 1M tokens (prompts ≤128K tokens) per 1M tokens

Prices shown are after Google's August 12, 2024 reduction (input -78%, output -71%) for prompts under 128K tokens; higher-context prompts were billed at roughly double these rates. Model is now retired, so pricing is historical.

Pricing source ↗

Strengths

1-million-token context window for processing very long documents, codebases, video and audio
Native multimodal input: text, images, audio and video can be interleaved in a single prompt
Very low latency and high throughput, designed for high-volume, high-frequency tasks
Aggressively low pricing after the August 2024 cut ($0.075 / $0.30 per 1M input/output tokens under 128K)
Strong long-context recall: near-perfect needle-in-a-haystack retrieval reported in the Gemini 1.5 technical report
Distilled from Gemini 1.5 Pro, so it retains much of the larger model's capability at a fraction of the cost

Best for

Summarizing long documents, transcripts, and multi-hour audio or video
High-volume chat and customer-support assistants where latency and cost matter
Image and video captioning and multimodal understanding at scale
Structured data extraction from long PDFs, tables, and reports
Classification, tagging, and routing tasks across large request volumes
Reading and reasoning over large codebases within the 1M-token window

How to access

Provider	Model ID
Google AI (Gemini API) ↗	`gemini-1.5-flash`
Google Cloud Vertex AI ↗	`gemini-1.5-flash-002`

Gemini Flash — every version

The full lineage of the Gemini Flash line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Gemini 3.5 Flashcurrent	2026-05-19	—	Proprietary
Gemini 3 Flash	2025-12-17	—	Proprietary
Gemini 2.5 Flash	2025-04-17	—	Proprietary
Gemini 2.0 Flash	2025-01-30	—	Proprietary
Gemini 1.5 Flash	2024-05-14	—	Proprietary

FAQ

Is Gemini 1.5 Flash still available?

No. Gemini 1.5 Flash is retired. Google deprecated the Gemini 1.5 line and shut the models down in 2025 — gemini-1.5-flash-001 on May 27, 2025, and the gemini-1.5-flash / gemini-1.5-flash-002 alias around September 24-29, 2025. Google recommends migrating to Gemini 2.0 Flash or a newer Gemini Flash model.

What was the context window of Gemini 1.5 Flash?

Gemini 1.5 Flash had a 1-million-token context window (1,048,576 tokens), the same headline long-context capability as Gemini 1.5 Pro. It could take text, images, audio and video interleaved in a single prompt.

How much did Gemini 1.5 Flash cost?

After Google's August 2024 price cut, Gemini 1.5 Flash cost $0.075 per million input tokens and $0.30 per million output tokens for prompts under 128K tokens, with roughly double those rates for longer contexts. It was one of the cheapest capable models of its time. These prices are now historical since the model is retired.

How is Gemini 1.5 Flash different from Gemini 1.5 Pro?

Gemini 1.5 Flash is a transformer decoder model that Google online-distilled from Gemini 1.5 Pro. It keeps the same 1M-token multimodal context but is optimized for lower latency and cheaper, higher-volume serving, trading some peak reasoning quality for speed and cost. Pro was the heavyweight flagship; Flash was the fast, economical sibling.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// Gemini Flash — every version

// FAQ