AI/TLDR

Gemini 3 Flash

Gemini 3 reasoning at Flash speed and cost — and it beats Gemini 3 Pro on agentic coding.

Overview

Gemini 3 Flash is Google's fast, cost-efficient model in the Gemini 3 family, released on December 17, 2025. It carries over the complex reasoning, multimodal understanding, and tool-use abilities of Gemini 3 Pro, but is tuned for low latency and low price — Google positions it as the everyday workhorse for high-volume, latency-sensitive workloads where Pro would be overkill.

Despite being the smaller, cheaper tier, Gemini 3 Flash punches well above its weight. It accepts text, images, audio, video, and PDFs across a 1 million-token context window and returns up to roughly 64K tokens of text output. It supports configurable 'thinking levels' (minimal, low, medium, high), structured output, function calling, and automatic context caching, so developers can dial reasoning effort up or down per request to trade cost against quality.

On Google's published benchmarks, Gemini 3 Flash scores 90.4% on GPQA Diamond and 81.2% on MMMU-Pro — close to Gemini 3 Pro — and reaches 78% on SWE-bench Verified for agentic coding, where Google reports it actually edges out Gemini 3 Pro. At $0.50 per 1M input tokens and $3 per 1M output tokens, it undercuts the Pro tier substantially while running about 3x faster than Gemini 2.5 Pro.

Released2025-12-17
LicenseProprietary
WeightsAPI only
ParametersNot disclosed
Context1M
Max output64K
ArchitectureProprietary dense/sparse Transformer (Gemini 3 family); Google has not published parameter counts or architectural details for the Flash tier.
Knowledge cutoffJanuary 2025
ModalitiesText, Vision, Audio, Video, PDF
StatusPreview

Benchmarks

  1. GPQA Diamond90.4%
  2. MMMU-Pro81.2%
  3. SWE-bench Verified78%
  4. Humanity's Last Exam (no tools)33.7%
  5. MCP Atlas57.4%
  6. Toolathlon49.4%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.50 / 1M tokens per 1M tokens
Output$3.00 / 1M tokens per 1M tokens

Audio input is priced at $1.00 / 1M tokens. Automatic context caching can reduce effective cost for repeated content. Pricing is for the Gemini API; preview model.

Pricing source ↗

Strengths

  • Near-Pro reasoning quality at a fraction of the price and latency — about 3x faster than Gemini 2.5 Pro
  • Strong agentic coding: 78% on SWE-bench Verified, ahead of Gemini 3 Pro on Google's own figures
  • 1 million-token context window for long documents, codebases, and multi-file inputs
  • Genuinely multimodal input — text, images, audio, video, and PDFs in a single request
  • Configurable thinking levels let you tune cost vs. depth per call, with automatic context caching to cut repeated-prompt costs
  • Uses roughly 30% fewer tokens than Gemini 2.5 Pro on typical tasks, lowering effective cost further

Best for

  • High-volume production features where Gemini 3 Pro would be too slow or expensive
  • Agentic coding assistants and code-fixing agents (Gemini CLI, Antigravity, IDE tools)
  • Long-document and PDF analysis, summarization, and extraction over the 1M-token window
  • Multimodal apps that mix images, audio, video, and text in one pipeline
  • Real-time chat and search experiences (it powers AI Mode in Search and the Gemini app) needing fast responses
  • Tool-using and multi-step workflow agents that need solid reasoning at low latency

How to access

ProviderModel ID
Google AI Studio / Gemini API ↗gemini-3-flash-preview
Google Vertex AI ↗gemini-3-flash
OpenRouter ↗google/gemini-3-flash-preview

Gemini Flash — every version

The full lineage of the Gemini Flash line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Gemini 3.5 Flashcurrent2026-05-19Proprietary
Gemini 3 Flash2025-12-17Proprietary
Gemini 2.5 Flash2025-04-17Proprietary
Gemini 2.0 Flash2025-01-30Proprietary
Gemini 1.5 Flash2024-05-14Proprietary

FAQ

When was Gemini 3 Flash released?

Google released Gemini 3 Flash on December 17, 2025, rolling it out to the Gemini app, AI Mode in Search, and via API through Google AI Studio, Vertex AI, Google Antigravity, and Gemini Enterprise. It launched as a preview model.

How much does Gemini 3 Flash cost?

Via the Gemini API it costs $0.50 per 1M input tokens and $3.00 per 1M output tokens, with audio input priced at $1.00 per 1M tokens. Automatic context caching can lower the effective cost for repeated content.

How does Gemini 3 Flash compare to Gemini 3 Pro?

Gemini 3 Flash is the faster, cheaper tier — roughly 3x faster than Gemini 2.5 Pro — yet stays close to Gemini 3 Pro on reasoning (90.4% GPQA Diamond, 81.2% MMMU-Pro). On Google's figures it actually beats Gemini 3 Pro on agentic coding, scoring 78% on SWE-bench Verified.

What inputs does Gemini 3 Flash support?

It accepts text, images, audio, video, and PDFs across a 1 million-token context window, and returns text output (up to about 64K tokens). It also supports configurable thinking levels, structured output, and function calling.