AI/TLDR

Gemini 2.0 Flash-Lite

Google's cheapest 2.0-era Gemini, tuned for high-volume, low-cost text.

Overview

Gemini 2.0 Flash-Lite was Google's most cost-efficient model in the Gemini 2.0 generation, built as a cheaper, lower-latency companion to Gemini 2.0 Flash. Google positioned it as a drop-in upgrade from Gemini 1.5 Flash: better quality on most benchmarks at the same speed and cost. It first appeared in public preview on February 5, 2025 and became generally available on February 25, 2025.

Despite the 'Lite' branding, Gemini 2.0 Flash-Lite kept the full 1 million-token context window of the Flash line and accepted multimodal input (text, code, images, audio, and video), returning text output. It was aimed squarely at high-volume, latency-sensitive workloads — Google's own example was generating one-line captions for roughly 40,000 photos for under a dollar on the paid tier.

Gemini 2.0 Flash-Lite has since been retired. Google deprecated the model and shut it down on June 1, 2026, with the gemini-2.0-flash-lite-001 endpoint discontinued and users routed to newer Gemini Flash-Lite generations (2.5, 3, 3.1). It remains a useful reference point for the 2.0-era price-performance baseline.

Released2025-02-01
LicenseProprietary
WeightsAPI only
Context1,048,576 tokens (1M)
Max output8,192 tokens
ArchitectureProprietary Transformer-based multimodal model; the cost-optimized, lowest-latency tier of the Gemini 2.0 Flash family. Google has not published parameter counts or architectural internals.
Knowledge cutoffJune 2024
ModalitiesText, Image, Audio, Video
StatusRetired — deprecated and shut down on June 1, 2026. The model ID gemini-2.0-flash-lite-001 is no longer available; Google directs users to newer Gemini Flash-Lite releases.

Benchmarks

  1. MMLU-Pro71.6%
  2. MATH86.8%
  3. MMMU68%
  4. Global-MMLU-Lite78.2%
  5. FACTS Grounding83.6%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.075 per 1M tokens per 1M tokens
Output$0.30 per 1M tokens per 1M tokens

Paid-tier rate; a single flat price regardless of prompt size. A free tier was also offered. Model retired June 1, 2026; pricing shown is historical.

Pricing source ↗

Strengths

  • Lowest price in the Gemini 2.0 family ($0.075/M input, $0.30/M output), with a single flat rate regardless of prompt length
  • Full 1M-token context window despite the 'Lite' positioning
  • Multimodal input (text, code, image, audio, video) for cheap document, image, and video understanding
  • Low latency and high throughput, suited to large-scale batch and real-time text generation
  • Beat Gemini 1.5 Flash on most benchmarks at the same speed and cost — a clean cost-neutral upgrade

Best for

  • High-volume text generation such as captioning, tagging, and summarization at scale
  • Cheap classification, extraction, and routing in data pipelines
  • Latency-sensitive chat and assistant features where cost per call matters
  • Long-document and multimodal processing on a tight budget using the 1M-token window
  • Prototyping and fallback tiers where quality is good-enough but spend must stay low

How to access

ProviderModel ID
Google AI (Gemini API / AI Studio) ↗gemini-2.0-flash-lite
Google Cloud Vertex AI ↗gemini-2.0-flash-lite-001

Gemini Flash-Lite — every version

The full lineage of the Gemini Flash-Lite line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Gemini 3.1 Flash-Litecurrent2026-03-03Proprietary
Gemini 2.5 Flash-Lite2025-06-17Proprietary
Gemini 2.0 Flash-Lite2025-02-01Proprietary
Gemini 1.5 Flash-8B2024-10-03Proprietary

FAQ

Is Gemini 2.0 Flash-Lite still available?

No. Google deprecated it and shut it down on June 1, 2026. The gemini-2.0-flash-lite-001 endpoint is discontinued; Google recommends migrating to a newer Gemini Flash-Lite generation (2.5, 3, or 3.1).

How much did Gemini 2.0 Flash-Lite cost?

On the paid tier it was $0.075 per million input tokens and $0.30 per million output tokens — a single flat rate regardless of prompt length, matching the price of Gemini 1.5 Flash. A free tier was also available.

What is the difference between Gemini 2.0 Flash and Flash-Lite?

Flash-Lite was the cheaper, lower-latency tier of the 2.0 family. Both shared the 1M-token context window and multimodal input, but Flash-Lite was cost-optimized for large-scale text workloads, trading some quality for a lower price point.

How large is the Gemini 2.0 Flash-Lite context window?

It supported up to 1,048,576 input tokens (1M) and produced up to 8,192 output tokens, with a knowledge cutoff of June 2024.