AI/TLDR

Gemini 2.5 Flash-Lite

Google's cheapest, fastest Gemini 2.5 model for high-volume tasks

Overview

Gemini 2.5 Flash-Lite is the smallest and cheapest model in Google's Gemini 2.5 family. Google first released it in preview on 17 June 2025 alongside the general availability of Gemini 2.5 Flash and Pro, then made Flash-Lite itself stable and generally available on 22 July 2025. It is positioned as a cost- and speed-optimized model for high-frequency, high-volume work such as classification, translation, data extraction, summarization, and intelligent routing.

Like the rest of the 2.5 line, Gemini 2.5 Flash-Lite is a 'thinking' model, but reasoning is off by default to keep latency and cost low. Developers can switch it on and set a controllable thinking budget when a task needs more careful reasoning, paying for the extra reasoning tokens only when they use them. The model accepts text, images, audio, video, and PDF input and returns text, with a 1,048,576-token (1M) context window and up to 65,536 output tokens.

It is Google's most affordable Gemini 2.5 model: $0.10 per million input tokens and $0.40 per million output tokens (audio input is priced higher at $0.30 per million). It also supports native tools including Grounding with Google Search, code execution, URL context, function calling, structured output, and context caching, and is available through both the Gemini API (Google AI Studio) and Vertex AI.

Released2025-06-17
LicenseProprietary (Google), API-only
WeightsAPI only
ParametersUndisclosed
Context1M
Max output64K
ArchitectureSparse mixture-of-experts (MoE) transformer with optional ("thinking") reasoning; the smallest, most cost-efficient tier of the Gemini 2.5 family. Google has not disclosed parameter counts.
Knowledge cutoffJanuary 2025
ModalitiesText, Vision, Audio, Video, PDF
StatusGenerally available

Benchmarks

  1. GPQA diamond (science reasoning)64.6%
  2. AIME 2025 (math)49.8%
  3. MMMU (multimodal reasoning)72.9%
  4. Global MMLU-Lite (multilingual)81.1%
  5. FACTS Grounding84.1%
  6. Vibe-Eval (multimodal)51.3%
  7. SWE-bench Verified (agentic coding)31.6%
  8. Aider Polyglot (code editing)26.7%
  9. SimpleQA (factuality)10.7%
  10. Humanity's Last Exam5.1%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.10 / 1M tokens (text / image / video) per 1M tokens
Cached input$0.025 / 1M tokens (context cache, text / image / video) per 1M tokens
Output$0.40 / 1M tokens per 1M tokens

Audio input is $0.30 / 1M tokens. Batch and Flex tiers are 50% of standard rates; cache storage is $1.00 per 1M tokens per hour. Prices are for the paid tier.

Pricing source ↗

Strengths

  • Lowest price in the Gemini 2.5 family ($0.10 input / $0.40 output per million tokens)
  • Lowest latency in the 2.5 family, built for high-throughput, high-volume workloads
  • Full 1M-token context window despite being the budget tier
  • Multimodal input: text, images, audio, video, and PDF
  • Optional thinking with a controllable, pay-only-when-used reasoning budget
  • Native tools: Google Search grounding, code execution, URL context, function calling, structured output, and caching

Best for

  • High-volume classification and intelligent request routing
  • Translation and large-scale summarization
  • Data extraction from documents, PDFs, and images
  • Latency-sensitive, cost-sensitive production pipelines
  • Long-document and long-context processing on a budget
  • Lightweight agentic tasks using search grounding and tool calls

How to access

ProviderModel ID
Google AI Studio / Gemini API ↗gemini-2.5-flash-lite
Google Vertex AI ↗gemini-2.5-flash-lite
OpenRouter ↗google/gemini-2.5-flash-lite

Gemini Flash-Lite — every version

The full lineage of the Gemini Flash-Lite line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Gemini 3.1 Flash-Litecurrent2026-03-03Proprietary
Gemini 2.5 Flash-Lite2025-06-17Proprietary
Gemini 2.0 Flash-Lite2025-02-01Proprietary
Gemini 1.5 Flash-8B2024-10-03Proprietary

FAQ

When was Gemini 2.5 Flash-Lite released?

Google released Gemini 2.5 Flash-Lite in preview on 17 June 2025, alongside the general availability of Gemini 2.5 Flash and Pro. Flash-Lite itself reached stable general availability on 22 July 2025.

How much does Gemini 2.5 Flash-Lite cost?

It is the cheapest Gemini 2.5 model: $0.10 per million input tokens (text, image, or video) and $0.40 per million output tokens. Audio input costs $0.30 per million tokens. Context caching and Batch/Flex tiers lower costs further.

What context window and modalities does it support?

Gemini 2.5 Flash-Lite has a 1,048,576-token (1M) context window and produces up to 65,536 output tokens. It accepts text, images, audio, video, and PDF input and returns text output.

Does Gemini 2.5 Flash-Lite support reasoning ('thinking')?

Yes. Like the rest of the Gemini 2.5 family it is a thinking model, but reasoning is turned off by default to keep it fast and cheap. Developers can enable it and set a controllable thinking budget, paying for reasoning tokens only when they use them.