Gemini 3 Flash

Name: Gemini 3 Flash
Author: Google

Gemini 3 reasoning at Flash speed and cost — and it beats Gemini 3 Pro on agentic coding.

Overview

Gemini 3 Flash is Google's fast, cost-efficient model in the Gemini 3 family, released on December 17, 2025. It carries over the complex reasoning, multimodal understanding, and tool-use abilities of Gemini 3 Pro, but is tuned for low latency and low price — Google positions it as the everyday workhorse for high-volume, latency-sensitive workloads where Pro would be overkill.

Despite being the smaller, cheaper tier, Gemini 3 Flash punches well above its weight. It accepts text, images, audio, video, and PDFs across a 1 million-token context window and returns up to roughly 64K tokens of text output. It supports configurable 'thinking levels' (minimal, low, medium, high), structured output, function calling, and automatic context caching, so developers can dial reasoning effort up or down per request to trade cost against quality.

On Google's published benchmarks, Gemini 3 Flash scores 90.4% on GPQA Diamond and 81.2% on MMMU-Pro — close to Gemini 3 Pro — and reaches 78% on SWE-bench Verified for agentic coding, where Google reports it actually edges out Gemini 3 Pro. At $0.50 per 1M input tokens and $3 per 1M output tokens, it undercuts the Pro tier substantially while running about 3x faster than Gemini 2.5 Pro.

Released	2025-12-17
License	Proprietary
Weights	API only
Parameters	Not disclosed
Context	1M
Max output	64K
Architecture	Proprietary dense/sparse Transformer (Gemini 3 family); Google has not published parameter counts or architectural details for the Flash tier.
Knowledge cutoff	January 2025
Modalities	Text, Vision, Audio, Video, PDF
Status	Preview

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.50 / 1M tokens per 1M tokens
Output	$3.00 / 1M tokens per 1M tokens

Audio input is priced at $1.00 / 1M tokens. Automatic context caching can reduce effective cost for repeated content. Pricing is for the Gemini API; preview model.

Pricing source ↗

Strengths

Near-Pro reasoning quality at a fraction of the price and latency — about 3x faster than Gemini 2.5 Pro
Strong agentic coding: 78% on SWE-bench Verified, ahead of Gemini 3 Pro on Google's own figures
1 million-token context window for long documents, codebases, and multi-file inputs
Genuinely multimodal input — text, images, audio, video, and PDFs in a single request
Configurable thinking levels let you tune cost vs. depth per call, with automatic context caching to cut repeated-prompt costs
Uses roughly 30% fewer tokens than Gemini 2.5 Pro on typical tasks, lowering effective cost further

Best for

High-volume production features where Gemini 3 Pro would be too slow or expensive
Agentic coding assistants and code-fixing agents (Gemini CLI, Antigravity, IDE tools)
Long-document and PDF analysis, summarization, and extraction over the 1M-token window
Multimodal apps that mix images, audio, video, and text in one pipeline
Real-time chat and search experiences (it powers AI Mode in Search and the Gemini app) needing fast responses
Tool-using and multi-step workflow agents that need solid reasoning at low latency

How to access

Provider	Model ID
Google AI Studio / Gemini API ↗	`gemini-3-flash-preview`
Google Vertex AI ↗	`gemini-3-flash`
OpenRouter ↗	`google/gemini-3-flash-preview`

Gemini Flash — every version

The full lineage of the Gemini Flash line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Gemini 3.5 Flashcurrent	2026-05-19	—	Proprietary
Gemini 3 Flash	2025-12-17	—	Proprietary
Gemini 2.5 Flash	2025-04-17	—	Proprietary
Gemini 2.0 Flash	2025-01-30	—	Proprietary
Gemini 1.5 Flash	2024-05-14	—	Proprietary

FAQ

When was Gemini 3 Flash released?

Google released Gemini 3 Flash on December 17, 2025, rolling it out to the Gemini app, AI Mode in Search, and via API through Google AI Studio, Vertex AI, Google Antigravity, and Gemini Enterprise. It launched as a preview model.

How much does Gemini 3 Flash cost?

Via the Gemini API it costs $0.50 per 1M input tokens and $3.00 per 1M output tokens, with audio input priced at $1.00 per 1M tokens. Automatic context caching can lower the effective cost for repeated content.

How does Gemini 3 Flash compare to Gemini 3 Pro?

Gemini 3 Flash is the faster, cheaper tier — roughly 3x faster than Gemini 2.5 Pro — yet stays close to Gemini 3 Pro on reasoning (90.4% GPQA Diamond, 81.2% MMMU-Pro). On Google's figures it actually beats Gemini 3 Pro on agentic coding, scoring 78% on SWE-bench Verified.

What inputs does Gemini 3 Flash support?

It accepts text, images, audio, video, and PDFs across a 1 million-token context window, and returns text output (up to about 64K tokens). It also supports configurable thinking levels, structured output, and function calling.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// Gemini Flash — every version

// FAQ