Gemini 2.0 Flash-Lite

Name: Gemini 2.0 Flash-Lite
Author: Google

Google's cheapest 2.0-era Gemini, tuned for high-volume, low-cost text.

Overview

Gemini 2.0 Flash-Lite was Google's most cost-efficient model in the Gemini 2.0 generation, built as a cheaper, lower-latency companion to Gemini 2.0 Flash. Google positioned it as a drop-in upgrade from Gemini 1.5 Flash: better quality on most benchmarks at the same speed and cost. It first appeared in public preview on February 5, 2025 and became generally available on February 25, 2025.

Despite the 'Lite' branding, Gemini 2.0 Flash-Lite kept the full 1 million-token context window of the Flash line and accepted multimodal input (text, code, images, audio, and video), returning text output. It was aimed squarely at high-volume, latency-sensitive workloads — Google's own example was generating one-line captions for roughly 40,000 photos for under a dollar on the paid tier.

Gemini 2.0 Flash-Lite has since been retired. Google deprecated the model and shut it down on June 1, 2026, with the gemini-2.0-flash-lite-001 endpoint discontinued and users routed to newer Gemini Flash-Lite generations (2.5, 3, 3.1). It remains a useful reference point for the 2.0-era price-performance baseline.

Released	2025-02-01
License	Proprietary
Weights	API only
Context	1,048,576 tokens (1M)
Max output	8,192 tokens
Architecture	Proprietary Transformer-based multimodal model; the cost-optimized, lowest-latency tier of the Gemini 2.0 Flash family. Google has not published parameter counts or architectural internals.
Knowledge cutoff	June 2024
Modalities	Text, Image, Audio, Video
Status	Retired — deprecated and shut down on June 1, 2026. The model ID gemini-2.0-flash-lite-001 is no longer available; Google directs users to newer Gemini Flash-Lite releases.

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.075 per 1M tokens per 1M tokens
Output	$0.30 per 1M tokens per 1M tokens

Paid-tier rate; a single flat price regardless of prompt size. A free tier was also offered. Model retired June 1, 2026; pricing shown is historical.

Pricing source ↗

Strengths

Lowest price in the Gemini 2.0 family ($0.075/M input, $0.30/M output), with a single flat rate regardless of prompt length
Full 1M-token context window despite the 'Lite' positioning
Multimodal input (text, code, image, audio, video) for cheap document, image, and video understanding
Low latency and high throughput, suited to large-scale batch and real-time text generation
Beat Gemini 1.5 Flash on most benchmarks at the same speed and cost — a clean cost-neutral upgrade

Best for

High-volume text generation such as captioning, tagging, and summarization at scale
Cheap classification, extraction, and routing in data pipelines
Latency-sensitive chat and assistant features where cost per call matters
Long-document and multimodal processing on a tight budget using the 1M-token window
Prototyping and fallback tiers where quality is good-enough but spend must stay low

How to access

Provider	Model ID
Google AI (Gemini API / AI Studio) ↗	`gemini-2.0-flash-lite`
Google Cloud Vertex AI ↗	`gemini-2.0-flash-lite-001`

Gemini Flash-Lite — every version

The full lineage of the Gemini Flash-Lite line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Gemini 3.1 Flash-Litecurrent	2026-03-03	—	Proprietary
Gemini 2.5 Flash-Lite	2025-06-17	—	Proprietary
Gemini 2.0 Flash-Lite	2025-02-01	—	Proprietary
Gemini 1.5 Flash-8B	2024-10-03	—	Proprietary

FAQ

Is Gemini 2.0 Flash-Lite still available?

No. Google deprecated it and shut it down on June 1, 2026. The gemini-2.0-flash-lite-001 endpoint is discontinued; Google recommends migrating to a newer Gemini Flash-Lite generation (2.5, 3, or 3.1).

How much did Gemini 2.0 Flash-Lite cost?

On the paid tier it was $0.075 per million input tokens and $0.30 per million output tokens — a single flat rate regardless of prompt length, matching the price of Gemini 1.5 Flash. A free tier was also available.

What is the difference between Gemini 2.0 Flash and Flash-Lite?

Flash-Lite was the cheaper, lower-latency tier of the 2.0 family. Both shared the 1M-token context window and multimodal input, but Flash-Lite was cost-optimized for large-scale text workloads, trading some quality for a lower price point.

How large is the Gemini 2.0 Flash-Lite context window?

It supported up to 1,048,576 input tokens (1M) and produced up to 8,192 output tokens, with a knowledge cutoff of June 2024.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// Gemini Flash-Lite — every version

// FAQ