Google · 2026-03-03 · notable
Gemini 3.1 Flash-Lite — cost-efficient multimodal model
Google's most cost-efficient Gemini 3 model: 2.5x faster time-to-first-token and 45% faster output than 2.5 Flash, 1M-token context, $0.25/M input — tops six benchmarks over GPT-5 mini and Haiku.

Google's cheapest and fastest Gemini 3 model — 2.5x faster first-token delivery than Flash, $0.25 per million input tokens, and a million-token context window.
Key specs
| Context window | 1M tokens |
|---|---|
| Input pricing | $0.25/M tokens |
| Output pricing | $1.50/M tokens |
| Ttft improvement | 2.5x faster |
What is it?
Gemini 3.1 Flash-Lite is Google's most cost-efficient model in the Gemini 3 series, launched in preview on March 3, 2026. It is built for high-volume developer workloads where speed and cost matter more than maximum capability: translation, content moderation, UI generation, data extraction, and classification.
How does it work?
The model processes multimodal prompts (text, images, code) up to 1 million tokens and generates responses up to 64,000 tokens. It delivers first output tokens 2.5x faster than Gemini 2.5 Flash and generates answers 45% faster overall. At $0.25/M input and $1.50/M output, it is priced to compete directly with GPT-5 mini and Claude Haiku. It topped six of eleven benchmark tests over both competitors.
Why does it matter?
For teams running AI at high volume — content moderation, translation pipelines, data processing — the cost of frontier models is prohibitive. Flash-Lite sits in the sweet spot: fast enough for real-time use, cheap enough for millions of daily calls, and capable enough to beat the other economy-tier models on most benchmarks.
Who is it for?
High-volume API users, teams running classification/extraction pipelines, developers on a budget.
Try it
ai.google.dev/gemini-api (preview via AI Studio or Vertex AI)