AI/TLDR

Gemini 1.5 Flash

Google's first speed-and-cost-optimized Gemini, with a 1M-token multimodal context window.

Overview

Gemini 1.5 Flash is the lightweight, speed-and-cost-optimized member of Google's Gemini 1.5 family, announced at Google I/O on May 14, 2024 and made generally available in June 2024. Where Gemini 1.5 Pro was the heavyweight flagship, Gemini 1.5 Flash was Google's answer to demand for a cheaper, faster model that still kept the headline feature of the 1.5 generation: a 1-million-token context window with native multimodal input across text, images, audio and video.

Technically, Gemini 1.5 Flash is a transformer decoder model that Google online-distilled from Gemini 1.5 Pro, so it shares the same long-context and multimodal capabilities but runs with much lower latency and serving cost. Google positioned it for high-volume, high-frequency work such as summarization, chat, captioning, and data extraction from long documents, rather than for the hardest reasoning tasks. In August 2024 Google cut its price sharply, to $0.075 per million input tokens and $0.30 per million output tokens for prompts under 128K tokens, making it one of the cheapest capable models of its era.

Gemini 1.5 Flash is now retired. Google deprecated the Gemini 1.5 line and shut the models down in 2025: gemini-1.5-flash-001 was discontinued on May 27, 2025, and the gemini-1.5-flash / gemini-1.5-flash-002 alias was shut down around September 24-29, 2025. Developers were directed to migrate to Gemini 2.0 Flash (and later Gemini 2.5 Flash). It is documented here as a historical reference point in the Gemini Flash line.

Released2024-05-14
LicenseProprietary
WeightsAPI only
ParametersNot disclosed
Context1,048,576 tokens (1M)
Max output8,192 tokens
ArchitectureTransformer decoder model, online-distilled from the larger Gemini 1.5 Pro. It inherits Gemini 1.5 Pro's long-context and natively multimodal design (text, image, audio, video interleaved as input) but is tuned for lower latency and cheaper, higher-throughput serving.
Knowledge cutoffNovember 2023
ModalitiesText, Image, Audio, Video
StatusRetired. Gemini 1.5 Flash is deprecated and shut down on the Gemini API: gemini-1.5-flash-001 was discontinued May 27, 2025, and the gemini-1.5-flash / gemini-1.5-flash-002 alias was shut down around September 24-29, 2025. Google recommends migrating to Gemini 2.0 Flash or a later Gemini Flash model.

Benchmarks

  1. MMLU (general knowledge, multiple choice)78.9%
  2. GPQA (graduate-level Q&A, 0-shot)39.5%
  3. MATH (4-shot, Minerva prompt)54.9%
  4. GSM8K (grade-school math, 11-shot)86.2%
  5. HumanEval (code generation, pass rate)74.3%
  6. BIG-Bench Hard (multi-step reasoning, 3-shot)85.5%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.075 per 1M tokens (prompts ≤128K tokens) per 1M tokens
Output$0.30 per 1M tokens (prompts ≤128K tokens) per 1M tokens

Prices shown are after Google's August 12, 2024 reduction (input -78%, output -71%) for prompts under 128K tokens; higher-context prompts were billed at roughly double these rates. Model is now retired, so pricing is historical.

Pricing source ↗

Strengths

  • 1-million-token context window for processing very long documents, codebases, video and audio
  • Native multimodal input: text, images, audio and video can be interleaved in a single prompt
  • Very low latency and high throughput, designed for high-volume, high-frequency tasks
  • Aggressively low pricing after the August 2024 cut ($0.075 / $0.30 per 1M input/output tokens under 128K)
  • Strong long-context recall: near-perfect needle-in-a-haystack retrieval reported in the Gemini 1.5 technical report
  • Distilled from Gemini 1.5 Pro, so it retains much of the larger model's capability at a fraction of the cost

Best for

  • Summarizing long documents, transcripts, and multi-hour audio or video
  • High-volume chat and customer-support assistants where latency and cost matter
  • Image and video captioning and multimodal understanding at scale
  • Structured data extraction from long PDFs, tables, and reports
  • Classification, tagging, and routing tasks across large request volumes
  • Reading and reasoning over large codebases within the 1M-token window

How to access

ProviderModel ID
Google AI (Gemini API) ↗gemini-1.5-flash
Google Cloud Vertex AI ↗gemini-1.5-flash-002

Gemini Flash — every version

The full lineage of the Gemini Flash line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Gemini 3.5 Flashcurrent2026-05-19Proprietary
Gemini 3 Flash2025-12-17Proprietary
Gemini 2.5 Flash2025-04-17Proprietary
Gemini 2.0 Flash2025-01-30Proprietary
Gemini 1.5 Flash2024-05-14Proprietary

FAQ

Is Gemini 1.5 Flash still available?

No. Gemini 1.5 Flash is retired. Google deprecated the Gemini 1.5 line and shut the models down in 2025 — gemini-1.5-flash-001 on May 27, 2025, and the gemini-1.5-flash / gemini-1.5-flash-002 alias around September 24-29, 2025. Google recommends migrating to Gemini 2.0 Flash or a newer Gemini Flash model.

What was the context window of Gemini 1.5 Flash?

Gemini 1.5 Flash had a 1-million-token context window (1,048,576 tokens), the same headline long-context capability as Gemini 1.5 Pro. It could take text, images, audio and video interleaved in a single prompt.

How much did Gemini 1.5 Flash cost?

After Google's August 2024 price cut, Gemini 1.5 Flash cost $0.075 per million input tokens and $0.30 per million output tokens for prompts under 128K tokens, with roughly double those rates for longer contexts. It was one of the cheapest capable models of its time. These prices are now historical since the model is retired.

How is Gemini 1.5 Flash different from Gemini 1.5 Pro?

Gemini 1.5 Flash is a transformer decoder model that Google online-distilled from Gemini 1.5 Pro. It keeps the same 1M-token multimodal context but is optimized for lower latency and cheaper, higher-volume serving, trading some peak reasoning quality for speed and cost. Pro was the heavyweight flagship; Flash was the fast, economical sibling.