Overview
Gemini 2.5 Flash is Google's fast, cost-efficient member of the Gemini 2.5 family, first released in preview on April 17, 2025 and later promoted to general availability. It sits below Gemini 2.5 Pro but above the older 2.0 Flash line, and Google positions it as the workhorse model that gives the best cost-per-intelligence in the lineup.
Its headline feature is hybrid reasoning: Gemini 2.5 Flash is a "thinking" model that can reason step by step before answering, and developers can control a thinking budget to trade off quality against speed and cost. Thinking tokens are billed at the output rate, so turning thinking down directly lowers cost. The model accepts text, images, video, audio, and PDF documents as input and returns text, with a 1M-token context window and up to 64K output tokens.
Built as a sparse mixture-of-experts transformer, Gemini 2.5 Flash has a January 2025 knowledge cutoff and is served through the Gemini API in Google AI Studio and Vertex AI. Google later shipped improved 2.5 Flash previews and newer Flash generations (Gemini 3 Flash, 3.5 Flash), so the original 2.5 Flash is now a previous-generation model, though it remains widely used for high-volume, latency-sensitive work.
| Released | 2025-04-17 |
|---|---|
| License | Proprietary |
| Weights | API only |
| Context | 1M |
| Max output | 64K |
| Architecture | Sparse mixture-of-experts (MoE) transformer with native multimodal support for text, vision, and audio. Total and active parameter counts are not publicly disclosed. |
| Knowledge cutoff | January 2025 |
| Modalities | Text, Vision, Audio, Video, PDF |
| Status | Superseded |
Benchmarks
- GPQA (diamond, no tools)82.8%
- AIME 202572%
- LiveCodeBench59.3%
- SWE-bench Verified (multiple attempts)60.3%
- Aider Polyglot (single attempt)56.7%
- Humanity's Last Exam (no tools)11%
- Global MMLU (Lite)88.4%
- MMMU79.7%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.30 / 1M tokens (text / image / video); $1.00 / 1M tokens (audio) per 1M tokens |
|---|---|
| Cached input | $0.03 / 1M tokens (text / image / video) per 1M tokens |
| Output | $2.50 / 1M tokens (includes thinking tokens) per 1M tokens |
Paid-tier Gemini Developer API pricing. Output price includes thinking tokens. Context caching also has a storage fee of $1.00 per 1M tokens per hour.
Strengths
- Hybrid reasoning with a controllable thinking budget, so you can dial reasoning up for hard tasks or off for cheap, fast responses
- Low price for a reasoning-capable multimodal model: $0.30 per million input tokens and $2.50 per million output tokens
- 1M-token context window for long documents, large codebases, and extended multimodal inputs
- Native multimodal input across text, images, video, audio, and PDF documents
- Strong cost-per-intelligence: outscores previous Flash models and the year-old Gemini 1.5 Pro on most evals
Best for
- High-volume chat, summarization, and extraction where cost and latency matter
- Long-context document and PDF analysis within a single 1M-token window
- Multimodal tasks combining text with images, audio, or video
- Agentic and tool-use workflows that need reasoning without flagship-tier pricing
- Coding assistance and code understanding at scale
How to access
| Provider | Model ID |
|---|---|
| Google AI Studio / Gemini API ↗ | gemini-2.5-flash |
| Google Vertex AI ↗ | gemini-2.5-flash |
Gemini Flash — every version
The full lineage of the Gemini Flash line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| Gemini 3.5 Flashcurrent | 2026-05-19 | — | Proprietary |
| Gemini 3 Flash | 2025-12-17 | — | Proprietary |
| Gemini 2.5 Flash | 2025-04-17 | — | Proprietary |
| Gemini 2.0 Flash | 2025-01-30 | — | Proprietary |
| Gemini 1.5 Flash | 2024-05-14 | — | Proprietary |
FAQ
When was Gemini 2.5 Flash released?
Google launched Gemini 2.5 Flash in preview for developers on April 17, 2025, via the Gemini API in Google AI Studio and Vertex AI. It later moved to general availability, and Google shipped improved 2.5 Flash previews afterward.
How much does Gemini 2.5 Flash cost?
On the paid Gemini Developer API tier it costs $0.30 per million input tokens (text, image, or video) and $2.50 per million output tokens. Audio input is $1.00 per million tokens, and cached text input is $0.03 per million tokens. Thinking tokens are billed at the output rate.
What is the context window of Gemini 2.5 Flash?
Gemini 2.5 Flash has a 1 million-token input context window and can produce up to 64K output tokens. It accepts text, images, video, audio, and PDF documents as input.
What makes Gemini 2.5 Flash different from Gemini 2.5 Pro?
Both are hybrid "thinking" models, but Flash is tuned for speed and low cost while Pro is the more capable flagship. On Google's own benchmarks Pro scores higher (for example 86.4% vs 82.8% on GPQA diamond and 88.0% vs 72.0% on AIME 2025), but Flash is significantly cheaper, making it the better choice for high-volume, latency-sensitive workloads.