Overview
Gemini 3 Flash is Google's fast, cost-efficient model in the Gemini 3 family, released on December 17, 2025. It carries over the complex reasoning, multimodal understanding, and tool-use abilities of Gemini 3 Pro, but is tuned for low latency and low price — Google positions it as the everyday workhorse for high-volume, latency-sensitive workloads where Pro would be overkill.
Despite being the smaller, cheaper tier, Gemini 3 Flash punches well above its weight. It accepts text, images, audio, video, and PDFs across a 1 million-token context window and returns up to roughly 64K tokens of text output. It supports configurable 'thinking levels' (minimal, low, medium, high), structured output, function calling, and automatic context caching, so developers can dial reasoning effort up or down per request to trade cost against quality.
On Google's published benchmarks, Gemini 3 Flash scores 90.4% on GPQA Diamond and 81.2% on MMMU-Pro — close to Gemini 3 Pro — and reaches 78% on SWE-bench Verified for agentic coding, where Google reports it actually edges out Gemini 3 Pro. At $0.50 per 1M input tokens and $3 per 1M output tokens, it undercuts the Pro tier substantially while running about 3x faster than Gemini 2.5 Pro.
| Released | 2025-12-17 |
|---|---|
| License | Proprietary |
| Weights | API only |
| Parameters | Not disclosed |
| Context | 1M |
| Max output | 64K |
| Architecture | Proprietary dense/sparse Transformer (Gemini 3 family); Google has not published parameter counts or architectural details for the Flash tier. |
| Knowledge cutoff | January 2025 |
| Modalities | Text, Vision, Audio, Video, PDF |
| Status | Preview |
Benchmarks
- GPQA Diamond90.4%
- MMMU-Pro81.2%
- SWE-bench Verified78%
- Humanity's Last Exam (no tools)33.7%
- MCP Atlas57.4%
- Toolathlon49.4%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.50 / 1M tokens per 1M tokens |
|---|---|
| Output | $3.00 / 1M tokens per 1M tokens |
Audio input is priced at $1.00 / 1M tokens. Automatic context caching can reduce effective cost for repeated content. Pricing is for the Gemini API; preview model.
Strengths
- Near-Pro reasoning quality at a fraction of the price and latency — about 3x faster than Gemini 2.5 Pro
- Strong agentic coding: 78% on SWE-bench Verified, ahead of Gemini 3 Pro on Google's own figures
- 1 million-token context window for long documents, codebases, and multi-file inputs
- Genuinely multimodal input — text, images, audio, video, and PDFs in a single request
- Configurable thinking levels let you tune cost vs. depth per call, with automatic context caching to cut repeated-prompt costs
- Uses roughly 30% fewer tokens than Gemini 2.5 Pro on typical tasks, lowering effective cost further
Best for
- High-volume production features where Gemini 3 Pro would be too slow or expensive
- Agentic coding assistants and code-fixing agents (Gemini CLI, Antigravity, IDE tools)
- Long-document and PDF analysis, summarization, and extraction over the 1M-token window
- Multimodal apps that mix images, audio, video, and text in one pipeline
- Real-time chat and search experiences (it powers AI Mode in Search and the Gemini app) needing fast responses
- Tool-using and multi-step workflow agents that need solid reasoning at low latency
How to access
| Provider | Model ID |
|---|---|
| Google AI Studio / Gemini API ↗ | gemini-3-flash-preview |
| Google Vertex AI ↗ | gemini-3-flash |
| OpenRouter ↗ | google/gemini-3-flash-preview |
Gemini Flash — every version
The full lineage of the Gemini Flash line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| Gemini 3.5 Flashcurrent | 2026-05-19 | — | Proprietary |
| Gemini 3 Flash | 2025-12-17 | — | Proprietary |
| Gemini 2.5 Flash | 2025-04-17 | — | Proprietary |
| Gemini 2.0 Flash | 2025-01-30 | — | Proprietary |
| Gemini 1.5 Flash | 2024-05-14 | — | Proprietary |
FAQ
When was Gemini 3 Flash released?
Google released Gemini 3 Flash on December 17, 2025, rolling it out to the Gemini app, AI Mode in Search, and via API through Google AI Studio, Vertex AI, Google Antigravity, and Gemini Enterprise. It launched as a preview model.
How much does Gemini 3 Flash cost?
Via the Gemini API it costs $0.50 per 1M input tokens and $3.00 per 1M output tokens, with audio input priced at $1.00 per 1M tokens. Automatic context caching can lower the effective cost for repeated content.
How does Gemini 3 Flash compare to Gemini 3 Pro?
Gemini 3 Flash is the faster, cheaper tier — roughly 3x faster than Gemini 2.5 Pro — yet stays close to Gemini 3 Pro on reasoning (90.4% GPQA Diamond, 81.2% MMMU-Pro). On Google's figures it actually beats Gemini 3 Pro on agentic coding, scoring 78% on SWE-bench Verified.
What inputs does Gemini 3 Flash support?
It accepts text, images, audio, video, and PDFs across a 1 million-token context window, and returns text output (up to about 64K tokens). It also supports configurable thinking levels, structured output, and function calling.