Overview
Gemini 1.5 Flash is the lightweight, speed-and-cost-optimized member of Google's Gemini 1.5 family, announced at Google I/O on May 14, 2024 and made generally available in June 2024. Where Gemini 1.5 Pro was the heavyweight flagship, Gemini 1.5 Flash was Google's answer to demand for a cheaper, faster model that still kept the headline feature of the 1.5 generation: a 1-million-token context window with native multimodal input across text, images, audio and video.
Technically, Gemini 1.5 Flash is a transformer decoder model that Google online-distilled from Gemini 1.5 Pro, so it shares the same long-context and multimodal capabilities but runs with much lower latency and serving cost. Google positioned it for high-volume, high-frequency work such as summarization, chat, captioning, and data extraction from long documents, rather than for the hardest reasoning tasks. In August 2024 Google cut its price sharply, to $0.075 per million input tokens and $0.30 per million output tokens for prompts under 128K tokens, making it one of the cheapest capable models of its era.
Gemini 1.5 Flash is now retired. Google deprecated the Gemini 1.5 line and shut the models down in 2025: gemini-1.5-flash-001 was discontinued on May 27, 2025, and the gemini-1.5-flash / gemini-1.5-flash-002 alias was shut down around September 24-29, 2025. Developers were directed to migrate to Gemini 2.0 Flash (and later Gemini 2.5 Flash). It is documented here as a historical reference point in the Gemini Flash line.
| Released | 2024-05-14 |
|---|---|
| License | Proprietary |
| Weights | API only |
| Parameters | Not disclosed |
| Context | 1,048,576 tokens (1M) |
| Max output | 8,192 tokens |
| Architecture | Transformer decoder model, online-distilled from the larger Gemini 1.5 Pro. It inherits Gemini 1.5 Pro's long-context and natively multimodal design (text, image, audio, video interleaved as input) but is tuned for lower latency and cheaper, higher-throughput serving. |
| Knowledge cutoff | November 2023 |
| Modalities | Text, Image, Audio, Video |
| Status | Retired. Gemini 1.5 Flash is deprecated and shut down on the Gemini API: gemini-1.5-flash-001 was discontinued May 27, 2025, and the gemini-1.5-flash / gemini-1.5-flash-002 alias was shut down around September 24-29, 2025. Google recommends migrating to Gemini 2.0 Flash or a later Gemini Flash model. |
Benchmarks
- MMLU (general knowledge, multiple choice)78.9%
- GPQA (graduate-level Q&A, 0-shot)39.5%
- MATH (4-shot, Minerva prompt)54.9%
- GSM8K (grade-school math, 11-shot)86.2%
- HumanEval (code generation, pass rate)74.3%
- BIG-Bench Hard (multi-step reasoning, 3-shot)85.5%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.075 per 1M tokens (prompts ≤128K tokens) per 1M tokens |
|---|---|
| Output | $0.30 per 1M tokens (prompts ≤128K tokens) per 1M tokens |
Prices shown are after Google's August 12, 2024 reduction (input -78%, output -71%) for prompts under 128K tokens; higher-context prompts were billed at roughly double these rates. Model is now retired, so pricing is historical.
Strengths
- 1-million-token context window for processing very long documents, codebases, video and audio
- Native multimodal input: text, images, audio and video can be interleaved in a single prompt
- Very low latency and high throughput, designed for high-volume, high-frequency tasks
- Aggressively low pricing after the August 2024 cut ($0.075 / $0.30 per 1M input/output tokens under 128K)
- Strong long-context recall: near-perfect needle-in-a-haystack retrieval reported in the Gemini 1.5 technical report
- Distilled from Gemini 1.5 Pro, so it retains much of the larger model's capability at a fraction of the cost
Best for
- Summarizing long documents, transcripts, and multi-hour audio or video
- High-volume chat and customer-support assistants where latency and cost matter
- Image and video captioning and multimodal understanding at scale
- Structured data extraction from long PDFs, tables, and reports
- Classification, tagging, and routing tasks across large request volumes
- Reading and reasoning over large codebases within the 1M-token window
How to access
| Provider | Model ID |
|---|---|
| Google AI (Gemini API) ↗ | gemini-1.5-flash |
| Google Cloud Vertex AI ↗ | gemini-1.5-flash-002 |
Gemini Flash — every version
The full lineage of the Gemini Flash line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| Gemini 3.5 Flashcurrent | 2026-05-19 | — | Proprietary |
| Gemini 3 Flash | 2025-12-17 | — | Proprietary |
| Gemini 2.5 Flash | 2025-04-17 | — | Proprietary |
| Gemini 2.0 Flash | 2025-01-30 | — | Proprietary |
| Gemini 1.5 Flash | 2024-05-14 | — | Proprietary |
FAQ
Is Gemini 1.5 Flash still available?
No. Gemini 1.5 Flash is retired. Google deprecated the Gemini 1.5 line and shut the models down in 2025 — gemini-1.5-flash-001 on May 27, 2025, and the gemini-1.5-flash / gemini-1.5-flash-002 alias around September 24-29, 2025. Google recommends migrating to Gemini 2.0 Flash or a newer Gemini Flash model.
What was the context window of Gemini 1.5 Flash?
Gemini 1.5 Flash had a 1-million-token context window (1,048,576 tokens), the same headline long-context capability as Gemini 1.5 Pro. It could take text, images, audio and video interleaved in a single prompt.
How much did Gemini 1.5 Flash cost?
After Google's August 2024 price cut, Gemini 1.5 Flash cost $0.075 per million input tokens and $0.30 per million output tokens for prompts under 128K tokens, with roughly double those rates for longer contexts. It was one of the cheapest capable models of its time. These prices are now historical since the model is retired.
How is Gemini 1.5 Flash different from Gemini 1.5 Pro?
Gemini 1.5 Flash is a transformer decoder model that Google online-distilled from Gemini 1.5 Pro. It keeps the same 1M-token multimodal context but is optimized for lower latency and cheaper, higher-volume serving, trading some peak reasoning quality for speed and cost. Pro was the heavyweight flagship; Flash was the fast, economical sibling.