AI/TLDR

Google · 2026-03-03 · notable

Gemini 3.1 Flash-Lite — cost-efficient multimodal model

Google's most cost-efficient Gemini 3 model: 2.5x faster time-to-first-token and 45% faster output than 2.5 Flash, 1M-token context, $0.25/M input — tops six benchmarks over GPT-5 mini and Haiku.

Gemini 3.1 Flash-Lite announcement card

Google's cheapest and fastest Gemini 3 model — 2.5x faster first-token delivery than Flash, $0.25 per million input tokens, and a million-token context window.

Key specs

Context window1M tokens
Input pricing$0.25/M tokens
Output pricing$1.50/M tokens
Ttft improvement2.5x faster

What is it?

Gemini 3.1 Flash-Lite is Google's most cost-efficient model in the Gemini 3 series, launched in preview on March 3, 2026. It is built for high-volume developer workloads where speed and cost matter more than maximum capability: translation, content moderation, UI generation, data extraction, and classification.

How does it work?

The model processes multimodal prompts (text, images, code) up to 1 million tokens and generates responses up to 64,000 tokens. It delivers first output tokens 2.5x faster than Gemini 2.5 Flash and generates answers 45% faster overall. At $0.25/M input and $1.50/M output, it is priced to compete directly with GPT-5 mini and Claude Haiku. It topped six of eleven benchmark tests over both competitors.

Why does it matter?

For teams running AI at high volume — content moderation, translation pipelines, data processing — the cost of frontier models is prohibitive. Flash-Lite sits in the sweet spot: fast enough for real-time use, cheap enough for millions of daily calls, and capable enough to beat the other economy-tier models on most benchmarks.

Who is it for?

High-volume API users, teams running classification/extraction pipelines, developers on a budget.

Try it

ai.google.dev/gemini-api (preview via AI Studio or Vertex AI)

Sources · 3 outlets

Tags

  • gemini
  • cost-efficient
  • fast-inference
  • multimodal

← All releases · Learn AI