Gemini 3.1 Flash-Lite

Name: Gemini 3.1 Flash-Lite
Author: Google

Google's cheapest, fastest Gemini 3 tier — a 1M-token multimodal model built for high-volume, latency-sensitive work.

Overview

Gemini 3.1 Flash-Lite is Google's fastest and lowest-cost member of the Gemini 3 family, built for high-volume, latency-sensitive jobs such as translation, classification, and large-scale summarisation. It is a natively multimodal reasoning model based on Gemini 3 Pro, accepting text, image, audio, and video input with a 1M-token context window and up to 64K output tokens.

Google launched Gemini 3.1 Flash-Lite in preview on March 3, 2026, and made the GA model (id gemini-3.1-flash-lite) generally available on May 7, 2026; the earlier gemini-3.1-flash-lite-preview endpoint was shut down on May 25, 2026. The model is served through the Gemini API, Google AI Studio, Vertex AI, the Gemini app, and Search AI Overviews.

Despite the low price, Google positions Flash-Lite as a capable model: at launch it reported the top score across six of the benchmarks Google used to compare it with GPT-5 mini and Claude 4.5 Haiku. It also runs noticeably faster than the previous generation — Google cites roughly 45% higher overall generation speed and a time-to-first-token about 2.5x shorter than Gemini 2.5 Flash.

Released	2026-03-03
License	Proprietary
Weights	API only
Parameters	Undisclosed
Context	1M
Max output	64K
Architecture	Built on Gemini 3 Pro; natively multimodal reasoning model
Knowledge cutoff	Jan 2025
Modalities	Text, Vision, Audio, Video
Status	Generally available

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.25 / 1M tokens
Cached input	$0.025 / 1M tokens
Output	$1.50 / 1M tokens

Standard tier, text/image/video; audio input is $0.50/1M. Context-cache storage $1.00/1M tokens per hour.

Pricing source ↗

Strengths

Very low cost at $0.25 input / $1.50 output per million tokens, aimed at intelligence per dollar
Fast: ~363 tokens/second output, ~45% faster generation and ~2.5x shorter time-to-first-token vs Gemini 2.5 Flash
1M-token context window for long documents and large multimodal prompts
Natively multimodal — accepts text, image, audio, and video input
Strong reasoning for its tier: 86.9% on GPQA Diamond and 88.9% on MMMLU

Best for

Reach for it for high-volume, cost-sensitive workloads like translation, classification, and tagging.
Reach for it when you need low latency and fast first-token response in user-facing apps.
Reach for it to process long or multimodal inputs (audio/video/images) cheaply within a 1M-token window.

How to access

Provider	Model ID
Google Gemini API ↗	`gemini-3.1-flash-lite`
Google Cloud Vertex AI ↗	`gemini-3.1-flash-lite`

Gemini Flash-Lite — every version

The full lineage of the Gemini Flash-Lite line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Gemini 3.1 Flash-Litecurrent	2026-03-03	—	Proprietary
Gemini 2.5 Flash-Lite	2025-06-17	—	Proprietary
Gemini 2.0 Flash-Lite	2025-02-01	—	Proprietary
Gemini 1.5 Flash-8B	2024-10-03	—	Proprietary

FAQ

How much does Gemini 3.1 Flash-Lite cost?

On the standard paid tier of the Gemini API, Gemini 3.1 Flash-Lite costs $0.25 per million input tokens (text/image/video) and $1.50 per million output tokens. Audio input is $0.50 per million tokens, and cached text/image/video input is $0.025 per million.

What is the context window of Gemini 3.1 Flash-Lite?

Gemini 3.1 Flash-Lite has a 1 million-token context window and can generate up to 64,000 output tokens. It accepts text, image, audio, and video input.

When was Gemini 3.1 Flash-Lite released?

Google launched Gemini 3.1 Flash-Lite in preview on March 3, 2026. The generally available model id gemini-3.1-flash-lite shipped on May 7, 2026, and the preview endpoint was shut down on May 25, 2026.

How fast is Gemini 3.1 Flash-Lite compared with Gemini 2.5 Flash?

Google reports about 45% higher overall generation speed and a time-to-first-token roughly 2.5x shorter than Gemini 2.5 Flash, with output around 363 tokens per second.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// Gemini Flash-Lite — every version

// FAQ