Overview
Gemini 3.1 Flash-Lite is Google's fastest and lowest-cost member of the Gemini 3 family, built for high-volume, latency-sensitive jobs such as translation, classification, and large-scale summarisation. It is a natively multimodal reasoning model based on Gemini 3 Pro, accepting text, image, audio, and video input with a 1M-token context window and up to 64K output tokens.
Google launched Gemini 3.1 Flash-Lite in preview on March 3, 2026, and made the GA model (id gemini-3.1-flash-lite) generally available on May 7, 2026; the earlier gemini-3.1-flash-lite-preview endpoint was shut down on May 25, 2026. The model is served through the Gemini API, Google AI Studio, Vertex AI, the Gemini app, and Search AI Overviews.
Despite the low price, Google positions Flash-Lite as a capable model: at launch it reported the top score across six of the benchmarks Google used to compare it with GPT-5 mini and Claude 4.5 Haiku. It also runs noticeably faster than the previous generation — Google cites roughly 45% higher overall generation speed and a time-to-first-token about 2.5x shorter than Gemini 2.5 Flash.
| Released | 2026-03-03 |
|---|---|
| License | Proprietary |
| Weights | API only |
| Parameters | Undisclosed |
| Context | 1M |
| Max output | 64K |
| Architecture | Built on Gemini 3 Pro; natively multimodal reasoning model |
| Knowledge cutoff | Jan 2025 |
| Modalities | Text, Vision, Audio, Video |
| Status | Generally available |
Benchmarks
- GPQA Diamond86.9%
- MMMLU88.9%
- MMMU-Pro76.8%
- LiveCodeBench72%
- Video-MMMU84.8%
- Humanity's Last Exam16%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.25 / 1M tokens |
|---|---|
| Cached input | $0.025 / 1M tokens |
| Output | $1.50 / 1M tokens |
Standard tier, text/image/video; audio input is $0.50/1M. Context-cache storage $1.00/1M tokens per hour.
Strengths
- Very low cost at $0.25 input / $1.50 output per million tokens, aimed at intelligence per dollar
- Fast: ~363 tokens/second output, ~45% faster generation and ~2.5x shorter time-to-first-token vs Gemini 2.5 Flash
- 1M-token context window for long documents and large multimodal prompts
- Natively multimodal — accepts text, image, audio, and video input
- Strong reasoning for its tier: 86.9% on GPQA Diamond and 88.9% on MMMLU
Best for
- Reach for it for high-volume, cost-sensitive workloads like translation, classification, and tagging.
- Reach for it when you need low latency and fast first-token response in user-facing apps.
- Reach for it to process long or multimodal inputs (audio/video/images) cheaply within a 1M-token window.
How to access
| Provider | Model ID |
|---|---|
| Google Gemini API ↗ | gemini-3.1-flash-lite |
| Google Cloud Vertex AI ↗ | gemini-3.1-flash-lite |
Gemini Flash-Lite — every version
The full lineage of the Gemini Flash-Lite line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| Gemini 3.1 Flash-Litecurrent | 2026-03-03 | — | Proprietary |
| Gemini 2.5 Flash-Lite | 2025-06-17 | — | Proprietary |
| Gemini 2.0 Flash-Lite | 2025-02-01 | — | Proprietary |
| Gemini 1.5 Flash-8B | 2024-10-03 | — | Proprietary |
FAQ
How much does Gemini 3.1 Flash-Lite cost?
On the standard paid tier of the Gemini API, Gemini 3.1 Flash-Lite costs $0.25 per million input tokens (text/image/video) and $1.50 per million output tokens. Audio input is $0.50 per million tokens, and cached text/image/video input is $0.025 per million.
What is the context window of Gemini 3.1 Flash-Lite?
Gemini 3.1 Flash-Lite has a 1 million-token context window and can generate up to 64,000 output tokens. It accepts text, image, audio, and video input.
When was Gemini 3.1 Flash-Lite released?
Google launched Gemini 3.1 Flash-Lite in preview on March 3, 2026. The generally available model id gemini-3.1-flash-lite shipped on May 7, 2026, and the preview endpoint was shut down on May 25, 2026.
How fast is Gemini 3.1 Flash-Lite compared with Gemini 2.5 Flash?
Google reports about 45% higher overall generation speed and a time-to-first-token roughly 2.5x shorter than Gemini 2.5 Flash, with output around 363 tokens per second.