Gemini 2.5 Flash

Name: Gemini 2.5 Flash
Author: Google

Google's fast, low-cost Gemini with switchable "thinking" and a 1M-token context window.

Overview

Gemini 2.5 Flash is Google's fast, cost-efficient member of the Gemini 2.5 family, first released in preview on April 17, 2025 and later promoted to general availability. It sits below Gemini 2.5 Pro but above the older 2.0 Flash line, and Google positions it as the workhorse model that gives the best cost-per-intelligence in the lineup.

Its headline feature is hybrid reasoning: Gemini 2.5 Flash is a "thinking" model that can reason step by step before answering, and developers can control a thinking budget to trade off quality against speed and cost. Thinking tokens are billed at the output rate, so turning thinking down directly lowers cost. The model accepts text, images, video, audio, and PDF documents as input and returns text, with a 1M-token context window and up to 64K output tokens.

Built as a sparse mixture-of-experts transformer, Gemini 2.5 Flash has a January 2025 knowledge cutoff and is served through the Gemini API in Google AI Studio and Vertex AI. Google later shipped improved 2.5 Flash previews and newer Flash generations (Gemini 3 Flash, 3.5 Flash), so the original 2.5 Flash is now a previous-generation model, though it remains widely used for high-volume, latency-sensitive work.

Released	2025-04-17
License	Proprietary
Weights	API only
Context	1M
Max output	64K
Architecture	Sparse mixture-of-experts (MoE) transformer with native multimodal support for text, vision, and audio. Total and active parameter counts are not publicly disclosed.
Knowledge cutoff	January 2025
Modalities	Text, Vision, Audio, Video, PDF
Status	Superseded

Benchmarks

GPQA (diamond, no tools)82.8%
AIME 202572%
LiveCodeBench59.3%
SWE-bench Verified (multiple attempts)60.3%
Aider Polyglot (single attempt)56.7%
Humanity's Last Exam (no tools)11%
Global MMLU (Lite)88.4%
MMMU79.7%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.30 / 1M tokens (text / image / video); $1.00 / 1M tokens (audio) per 1M tokens
Cached input	$0.03 / 1M tokens (text / image / video) per 1M tokens
Output	$2.50 / 1M tokens (includes thinking tokens) per 1M tokens

Paid-tier Gemini Developer API pricing. Output price includes thinking tokens. Context caching also has a storage fee of $1.00 per 1M tokens per hour.

Pricing source ↗

Strengths

Hybrid reasoning with a controllable thinking budget, so you can dial reasoning up for hard tasks or off for cheap, fast responses
Low price for a reasoning-capable multimodal model: $0.30 per million input tokens and $2.50 per million output tokens
1M-token context window for long documents, large codebases, and extended multimodal inputs
Native multimodal input across text, images, video, audio, and PDF documents
Strong cost-per-intelligence: outscores previous Flash models and the year-old Gemini 1.5 Pro on most evals

Best for

High-volume chat, summarization, and extraction where cost and latency matter
Long-context document and PDF analysis within a single 1M-token window
Multimodal tasks combining text with images, audio, or video
Agentic and tool-use workflows that need reasoning without flagship-tier pricing
Coding assistance and code understanding at scale

How to access

Provider	Model ID
Google AI Studio / Gemini API ↗	`gemini-2.5-flash`
Google Vertex AI ↗	`gemini-2.5-flash`

Gemini Flash — every version

The full lineage of the Gemini Flash line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Gemini 3.5 Flashcurrent	2026-05-19	—	Proprietary
Gemini 3 Flash	2025-12-17	—	Proprietary
Gemini 2.5 Flash	2025-04-17	—	Proprietary
Gemini 2.0 Flash	2025-01-30	—	Proprietary
Gemini 1.5 Flash	2024-05-14	—	Proprietary

FAQ

When was Gemini 2.5 Flash released?

Google launched Gemini 2.5 Flash in preview for developers on April 17, 2025, via the Gemini API in Google AI Studio and Vertex AI. It later moved to general availability, and Google shipped improved 2.5 Flash previews afterward.

How much does Gemini 2.5 Flash cost?

On the paid Gemini Developer API tier it costs $0.30 per million input tokens (text, image, or video) and $2.50 per million output tokens. Audio input is $1.00 per million tokens, and cached text input is $0.03 per million tokens. Thinking tokens are billed at the output rate.

What is the context window of Gemini 2.5 Flash?

Gemini 2.5 Flash has a 1 million-token input context window and can produce up to 64K output tokens. It accepts text, images, video, audio, and PDF documents as input.

What makes Gemini 2.5 Flash different from Gemini 2.5 Pro?

Both are hybrid "thinking" models, but Flash is tuned for speed and low cost while Pro is the more capable flagship. On Google's own benchmarks Pro scores higher (for example 86.4% vs 82.8% on GPQA diamond and 88.0% vs 72.0% on AIME 2025), but Flash is significantly cheaper, making it the better choice for high-volume, latency-sensitive workloads.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// Gemini Flash — every version

// FAQ