Overview
Gemini 2.0 Flash-Lite was Google's most cost-efficient model in the Gemini 2.0 generation, built as a cheaper, lower-latency companion to Gemini 2.0 Flash. Google positioned it as a drop-in upgrade from Gemini 1.5 Flash: better quality on most benchmarks at the same speed and cost. It first appeared in public preview on February 5, 2025 and became generally available on February 25, 2025.
Despite the 'Lite' branding, Gemini 2.0 Flash-Lite kept the full 1 million-token context window of the Flash line and accepted multimodal input (text, code, images, audio, and video), returning text output. It was aimed squarely at high-volume, latency-sensitive workloads — Google's own example was generating one-line captions for roughly 40,000 photos for under a dollar on the paid tier.
Gemini 2.0 Flash-Lite has since been retired. Google deprecated the model and shut it down on June 1, 2026, with the gemini-2.0-flash-lite-001 endpoint discontinued and users routed to newer Gemini Flash-Lite generations (2.5, 3, 3.1). It remains a useful reference point for the 2.0-era price-performance baseline.
| Released | 2025-02-01 |
|---|---|
| License | Proprietary |
| Weights | API only |
| Context | 1,048,576 tokens (1M) |
| Max output | 8,192 tokens |
| Architecture | Proprietary Transformer-based multimodal model; the cost-optimized, lowest-latency tier of the Gemini 2.0 Flash family. Google has not published parameter counts or architectural internals. |
| Knowledge cutoff | June 2024 |
| Modalities | Text, Image, Audio, Video |
| Status | Retired — deprecated and shut down on June 1, 2026. The model ID gemini-2.0-flash-lite-001 is no longer available; Google directs users to newer Gemini Flash-Lite releases. |
Benchmarks
- MMLU-Pro71.6%
- MATH86.8%
- MMMU68%
- Global-MMLU-Lite78.2%
- FACTS Grounding83.6%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.075 per 1M tokens per 1M tokens |
|---|---|
| Output | $0.30 per 1M tokens per 1M tokens |
Paid-tier rate; a single flat price regardless of prompt size. A free tier was also offered. Model retired June 1, 2026; pricing shown is historical.
Strengths
- Lowest price in the Gemini 2.0 family ($0.075/M input, $0.30/M output), with a single flat rate regardless of prompt length
- Full 1M-token context window despite the 'Lite' positioning
- Multimodal input (text, code, image, audio, video) for cheap document, image, and video understanding
- Low latency and high throughput, suited to large-scale batch and real-time text generation
- Beat Gemini 1.5 Flash on most benchmarks at the same speed and cost — a clean cost-neutral upgrade
Best for
- High-volume text generation such as captioning, tagging, and summarization at scale
- Cheap classification, extraction, and routing in data pipelines
- Latency-sensitive chat and assistant features where cost per call matters
- Long-document and multimodal processing on a tight budget using the 1M-token window
- Prototyping and fallback tiers where quality is good-enough but spend must stay low
How to access
| Provider | Model ID |
|---|---|
| Google AI (Gemini API / AI Studio) ↗ | gemini-2.0-flash-lite |
| Google Cloud Vertex AI ↗ | gemini-2.0-flash-lite-001 |
Gemini Flash-Lite — every version
The full lineage of the Gemini Flash-Lite line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| Gemini 3.1 Flash-Litecurrent | 2026-03-03 | — | Proprietary |
| Gemini 2.5 Flash-Lite | 2025-06-17 | — | Proprietary |
| Gemini 2.0 Flash-Lite | 2025-02-01 | — | Proprietary |
| Gemini 1.5 Flash-8B | 2024-10-03 | — | Proprietary |
FAQ
Is Gemini 2.0 Flash-Lite still available?
No. Google deprecated it and shut it down on June 1, 2026. The gemini-2.0-flash-lite-001 endpoint is discontinued; Google recommends migrating to a newer Gemini Flash-Lite generation (2.5, 3, or 3.1).
How much did Gemini 2.0 Flash-Lite cost?
On the paid tier it was $0.075 per million input tokens and $0.30 per million output tokens — a single flat rate regardless of prompt length, matching the price of Gemini 1.5 Flash. A free tier was also available.
What is the difference between Gemini 2.0 Flash and Flash-Lite?
Flash-Lite was the cheaper, lower-latency tier of the 2.0 family. Both shared the 1M-token context window and multimodal input, but Flash-Lite was cost-optimized for large-scale text workloads, trading some quality for a lower price point.
How large is the Gemini 2.0 Flash-Lite context window?
It supported up to 1,048,576 input tokens (1M) and produced up to 8,192 output tokens, with a knowledge cutoff of June 2024.