Gemini 2.5 Flash-Lite

Name: Gemini 2.5 Flash-Lite
Author: Google

Google's cheapest, fastest Gemini 2.5 model for high-volume tasks

Overview

Gemini 2.5 Flash-Lite is the smallest and cheapest model in Google's Gemini 2.5 family. Google first released it in preview on 17 June 2025 alongside the general availability of Gemini 2.5 Flash and Pro, then made Flash-Lite itself stable and generally available on 22 July 2025. It is positioned as a cost- and speed-optimized model for high-frequency, high-volume work such as classification, translation, data extraction, summarization, and intelligent routing.

Like the rest of the 2.5 line, Gemini 2.5 Flash-Lite is a 'thinking' model, but reasoning is off by default to keep latency and cost low. Developers can switch it on and set a controllable thinking budget when a task needs more careful reasoning, paying for the extra reasoning tokens only when they use them. The model accepts text, images, audio, video, and PDF input and returns text, with a 1,048,576-token (1M) context window and up to 65,536 output tokens.

It is Google's most affordable Gemini 2.5 model: $0.10 per million input tokens and $0.40 per million output tokens (audio input is priced higher at $0.30 per million). It also supports native tools including Grounding with Google Search, code execution, URL context, function calling, structured output, and context caching, and is available through both the Gemini API (Google AI Studio) and Vertex AI.

Released	2025-06-17
License	Proprietary (Google), API-only
Weights	API only
Parameters	Undisclosed
Context	1M
Max output	64K
Architecture	Sparse mixture-of-experts (MoE) transformer with optional ("thinking") reasoning; the smallest, most cost-efficient tier of the Gemini 2.5 family. Google has not disclosed parameter counts.
Knowledge cutoff	January 2025
Modalities	Text, Vision, Audio, Video, PDF
Status	Generally available

Benchmarks

GPQA diamond (science reasoning)64.6%
AIME 2025 (math)49.8%
MMMU (multimodal reasoning)72.9%
Global MMLU-Lite (multilingual)81.1%
FACTS Grounding84.1%
Vibe-Eval (multimodal)51.3%
SWE-bench Verified (agentic coding)31.6%
Aider Polyglot (code editing)26.7%
SimpleQA (factuality)10.7%
Humanity's Last Exam5.1%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.10 / 1M tokens (text / image / video) per 1M tokens
Cached input	$0.025 / 1M tokens (context cache, text / image / video) per 1M tokens
Output	$0.40 / 1M tokens per 1M tokens

Audio input is $0.30 / 1M tokens. Batch and Flex tiers are 50% of standard rates; cache storage is $1.00 per 1M tokens per hour. Prices are for the paid tier.

Pricing source ↗

Strengths

Lowest price in the Gemini 2.5 family ($0.10 input / $0.40 output per million tokens)
Lowest latency in the 2.5 family, built for high-throughput, high-volume workloads
Full 1M-token context window despite being the budget tier
Multimodal input: text, images, audio, video, and PDF
Optional thinking with a controllable, pay-only-when-used reasoning budget
Native tools: Google Search grounding, code execution, URL context, function calling, structured output, and caching

Best for

High-volume classification and intelligent request routing
Translation and large-scale summarization
Data extraction from documents, PDFs, and images
Latency-sensitive, cost-sensitive production pipelines
Long-document and long-context processing on a budget
Lightweight agentic tasks using search grounding and tool calls

How to access

Provider	Model ID
Google AI Studio / Gemini API ↗	`gemini-2.5-flash-lite`
Google Vertex AI ↗	`gemini-2.5-flash-lite`
OpenRouter ↗	`google/gemini-2.5-flash-lite`

Gemini Flash-Lite — every version

The full lineage of the Gemini Flash-Lite line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Gemini 3.1 Flash-Litecurrent	2026-03-03	—	Proprietary
Gemini 2.5 Flash-Lite	2025-06-17	—	Proprietary
Gemini 2.0 Flash-Lite	2025-02-01	—	Proprietary
Gemini 1.5 Flash-8B	2024-10-03	—	Proprietary

FAQ

When was Gemini 2.5 Flash-Lite released?

Google released Gemini 2.5 Flash-Lite in preview on 17 June 2025, alongside the general availability of Gemini 2.5 Flash and Pro. Flash-Lite itself reached stable general availability on 22 July 2025.

How much does Gemini 2.5 Flash-Lite cost?

It is the cheapest Gemini 2.5 model: $0.10 per million input tokens (text, image, or video) and $0.40 per million output tokens. Audio input costs $0.30 per million tokens. Context caching and Batch/Flex tiers lower costs further.

What context window and modalities does it support?

Gemini 2.5 Flash-Lite has a 1,048,576-token (1M) context window and produces up to 65,536 output tokens. It accepts text, images, audio, video, and PDF input and returns text output.

Does Gemini 2.5 Flash-Lite support reasoning ('thinking')?

Yes. Like the rest of the Gemini 2.5 family it is a thinking model, but reasoning is turned off by default to keep it fast and cheap. Developers can enable it and set a controllable thinking budget, paying for reasoning tokens only when they use them.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// Gemini Flash-Lite — every version

// FAQ