AI/TLDR

Gemini 3.1 Flash-Lite

Google's cheapest, fastest Gemini 3 tier — a 1M-token multimodal model built for high-volume, latency-sensitive work.

Overview

Gemini 3.1 Flash-Lite is Google's fastest and lowest-cost member of the Gemini 3 family, built for high-volume, latency-sensitive jobs such as translation, classification, and large-scale summarisation. It is a natively multimodal reasoning model based on Gemini 3 Pro, accepting text, image, audio, and video input with a 1M-token context window and up to 64K output tokens.

Google launched Gemini 3.1 Flash-Lite in preview on March 3, 2026, and made the GA model (id gemini-3.1-flash-lite) generally available on May 7, 2026; the earlier gemini-3.1-flash-lite-preview endpoint was shut down on May 25, 2026. The model is served through the Gemini API, Google AI Studio, Vertex AI, the Gemini app, and Search AI Overviews.

Despite the low price, Google positions Flash-Lite as a capable model: at launch it reported the top score across six of the benchmarks Google used to compare it with GPT-5 mini and Claude 4.5 Haiku. It also runs noticeably faster than the previous generation — Google cites roughly 45% higher overall generation speed and a time-to-first-token about 2.5x shorter than Gemini 2.5 Flash.

Released2026-03-03
LicenseProprietary
WeightsAPI only
ParametersUndisclosed
Context1M
Max output64K
ArchitectureBuilt on Gemini 3 Pro; natively multimodal reasoning model
Knowledge cutoffJan 2025
ModalitiesText, Vision, Audio, Video
StatusGenerally available

Benchmarks

  1. GPQA Diamond86.9%
  2. MMMLU88.9%
  3. MMMU-Pro76.8%
  4. LiveCodeBench72%
  5. Video-MMMU84.8%
  6. Humanity's Last Exam16%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.25 / 1M tokens
Cached input$0.025 / 1M tokens
Output$1.50 / 1M tokens

Standard tier, text/image/video; audio input is $0.50/1M. Context-cache storage $1.00/1M tokens per hour.

Pricing source ↗

Strengths

  • Very low cost at $0.25 input / $1.50 output per million tokens, aimed at intelligence per dollar
  • Fast: ~363 tokens/second output, ~45% faster generation and ~2.5x shorter time-to-first-token vs Gemini 2.5 Flash
  • 1M-token context window for long documents and large multimodal prompts
  • Natively multimodal — accepts text, image, audio, and video input
  • Strong reasoning for its tier: 86.9% on GPQA Diamond and 88.9% on MMMLU

Best for

  • Reach for it for high-volume, cost-sensitive workloads like translation, classification, and tagging.
  • Reach for it when you need low latency and fast first-token response in user-facing apps.
  • Reach for it to process long or multimodal inputs (audio/video/images) cheaply within a 1M-token window.

How to access

ProviderModel ID
Google Gemini API ↗gemini-3.1-flash-lite
Google Cloud Vertex AI ↗gemini-3.1-flash-lite

Gemini Flash-Lite — every version

The full lineage of the Gemini Flash-Lite line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Gemini 3.1 Flash-Litecurrent2026-03-03Proprietary
Gemini 2.5 Flash-Lite2025-06-17Proprietary
Gemini 2.0 Flash-Lite2025-02-01Proprietary
Gemini 1.5 Flash-8B2024-10-03Proprietary

FAQ

How much does Gemini 3.1 Flash-Lite cost?

On the standard paid tier of the Gemini API, Gemini 3.1 Flash-Lite costs $0.25 per million input tokens (text/image/video) and $1.50 per million output tokens. Audio input is $0.50 per million tokens, and cached text/image/video input is $0.025 per million.

What is the context window of Gemini 3.1 Flash-Lite?

Gemini 3.1 Flash-Lite has a 1 million-token context window and can generate up to 64,000 output tokens. It accepts text, image, audio, and video input.

When was Gemini 3.1 Flash-Lite released?

Google launched Gemini 3.1 Flash-Lite in preview on March 3, 2026. The generally available model id gemini-3.1-flash-lite shipped on May 7, 2026, and the preview endpoint was shut down on May 25, 2026.

How fast is Gemini 3.1 Flash-Lite compared with Gemini 2.5 Flash?

Google reports about 45% higher overall generation speed and a time-to-first-token roughly 2.5x shorter than Gemini 2.5 Flash, with output around 363 tokens per second.