AI/TLDR

Gemini 2.5 Flash

Google's fast, low-cost Gemini with switchable "thinking" and a 1M-token context window.

Overview

Gemini 2.5 Flash is Google's fast, cost-efficient member of the Gemini 2.5 family, first released in preview on April 17, 2025 and later promoted to general availability. It sits below Gemini 2.5 Pro but above the older 2.0 Flash line, and Google positions it as the workhorse model that gives the best cost-per-intelligence in the lineup.

Its headline feature is hybrid reasoning: Gemini 2.5 Flash is a "thinking" model that can reason step by step before answering, and developers can control a thinking budget to trade off quality against speed and cost. Thinking tokens are billed at the output rate, so turning thinking down directly lowers cost. The model accepts text, images, video, audio, and PDF documents as input and returns text, with a 1M-token context window and up to 64K output tokens.

Built as a sparse mixture-of-experts transformer, Gemini 2.5 Flash has a January 2025 knowledge cutoff and is served through the Gemini API in Google AI Studio and Vertex AI. Google later shipped improved 2.5 Flash previews and newer Flash generations (Gemini 3 Flash, 3.5 Flash), so the original 2.5 Flash is now a previous-generation model, though it remains widely used for high-volume, latency-sensitive work.

Released2025-04-17
LicenseProprietary
WeightsAPI only
Context1M
Max output64K
ArchitectureSparse mixture-of-experts (MoE) transformer with native multimodal support for text, vision, and audio. Total and active parameter counts are not publicly disclosed.
Knowledge cutoffJanuary 2025
ModalitiesText, Vision, Audio, Video, PDF
StatusSuperseded

Benchmarks

  1. GPQA (diamond, no tools)82.8%
  2. AIME 202572%
  3. LiveCodeBench59.3%
  4. SWE-bench Verified (multiple attempts)60.3%
  5. Aider Polyglot (single attempt)56.7%
  6. Humanity's Last Exam (no tools)11%
  7. Global MMLU (Lite)88.4%
  8. MMMU79.7%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.30 / 1M tokens (text / image / video); $1.00 / 1M tokens (audio) per 1M tokens
Cached input$0.03 / 1M tokens (text / image / video) per 1M tokens
Output$2.50 / 1M tokens (includes thinking tokens) per 1M tokens

Paid-tier Gemini Developer API pricing. Output price includes thinking tokens. Context caching also has a storage fee of $1.00 per 1M tokens per hour.

Pricing source ↗

Strengths

  • Hybrid reasoning with a controllable thinking budget, so you can dial reasoning up for hard tasks or off for cheap, fast responses
  • Low price for a reasoning-capable multimodal model: $0.30 per million input tokens and $2.50 per million output tokens
  • 1M-token context window for long documents, large codebases, and extended multimodal inputs
  • Native multimodal input across text, images, video, audio, and PDF documents
  • Strong cost-per-intelligence: outscores previous Flash models and the year-old Gemini 1.5 Pro on most evals

Best for

  • High-volume chat, summarization, and extraction where cost and latency matter
  • Long-context document and PDF analysis within a single 1M-token window
  • Multimodal tasks combining text with images, audio, or video
  • Agentic and tool-use workflows that need reasoning without flagship-tier pricing
  • Coding assistance and code understanding at scale

How to access

ProviderModel ID
Google AI Studio / Gemini API ↗gemini-2.5-flash
Google Vertex AI ↗gemini-2.5-flash

Gemini Flash — every version

The full lineage of the Gemini Flash line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Gemini 3.5 Flashcurrent2026-05-19Proprietary
Gemini 3 Flash2025-12-17Proprietary
Gemini 2.5 Flash2025-04-17Proprietary
Gemini 2.0 Flash2025-01-30Proprietary
Gemini 1.5 Flash2024-05-14Proprietary

FAQ

When was Gemini 2.5 Flash released?

Google launched Gemini 2.5 Flash in preview for developers on April 17, 2025, via the Gemini API in Google AI Studio and Vertex AI. It later moved to general availability, and Google shipped improved 2.5 Flash previews afterward.

How much does Gemini 2.5 Flash cost?

On the paid Gemini Developer API tier it costs $0.30 per million input tokens (text, image, or video) and $2.50 per million output tokens. Audio input is $1.00 per million tokens, and cached text input is $0.03 per million tokens. Thinking tokens are billed at the output rate.

What is the context window of Gemini 2.5 Flash?

Gemini 2.5 Flash has a 1 million-token input context window and can produce up to 64K output tokens. It accepts text, images, video, audio, and PDF documents as input.

What makes Gemini 2.5 Flash different from Gemini 2.5 Pro?

Both are hybrid "thinking" models, but Flash is tuned for speed and low cost while Pro is the more capable flagship. On Google's own benchmarks Pro scores higher (for example 86.4% vs 82.8% on GPQA diamond and 88.0% vs 72.0% on AIME 2025), but Flash is significantly cheaper, making it the better choice for high-volume, latency-sensitive workloads.