Gemini 3.5 Flash

Name: Gemini 3.5 Flash
Author: Google

Google's fast Flash model that beats Gemini 3.1 Pro on agentic and coding tasks at ~4x the speed.

Overview

Gemini 3.5 Flash is Google's fast, cost-efficient model in the Gemini Flash line, announced at Google I/O 2026 and made generally available on May 19, 2026 as the API model gemini-3.5-flash. It is the current flagship of the Flash tier, positioned as Google's most intelligent model for sustained agentic and coding work, and it is the first Flash-tier model to outperform a previous Pro tier (Gemini 3.1 Pro) on Google's published coding and agentic benchmarks.

Gemini 3.5 Flash accepts text, images, audio, video, and PDFs/documents as input and returns text, with a 1 million-token context window and up to roughly 64K output tokens. Reasoning depth is adjustable through a thinking_level setting (minimal, low, medium, high), and Google reports the model delivers output around four times faster than other frontier models, making it well suited to high-throughput agent and tool-use loops.

The model is available through the Gemini API, Google AI Studio, Vertex AI, the Gemini app, Google Search AI Mode, and Google's Antigravity agentic IDE, and it is selectable in third-party tools such as Cursor. Its knowledge cutoff is January 2025; for newer information Google recommends the built-in search grounding tool.

Released	2026-05-19
License	Proprietary
Weights	API only
Parameters	Not disclosed
Context	1M
Max output	64K
Architecture	Multimodal Gemini model built on Gemini 3 Flash. Reasoning depth is controllable through a thinking_level parameter (minimal, low, medium, high; medium is the default) rather than a numeric token budget, and the model dynamically allocates more compute to harder problems. It preserves intermediate reasoning across turns ("thought preservation") and natively supports function calling, structured output, code execution, and search grounding. Google has not disclosed the parameter count.
Knowledge cutoff	January 2025
Modalities	Text, Vision, Audio, Video, PDF
Status	Generally available

Benchmarks

Performance comparison table of Gemini 3.5 Flash versus Gemini 3 Flash, Gemini 3.1 Pro, Claude Sonnet 4.6, Claude Opus 4.7 and GPT-5.5 across coding, agentic, multimodal and long-context benchmarks. — Gemini 3.5 Flash benchmark comparison versus other Gemini, Claude and GPT models. — Google

Gemini 3.5 Flash benchmark comparison (results as of May 2026).

Benchmark	Gemini 3.5 Flash	Gemini 3 Flash	Gemini 3.1 Pro	Claude Sonnet 4.6	Claude Opus 4.7	GPT-5.5
Terminal-bench 2.1 (Agentic terminal coding, Terminus-2 harness)	76.2%	58%	70.3%	—	66.1%	78.2%
SWE-Bench Pro (Public) (Diverse agentic coding tasks, single attempt)	55.1%	49.6%	54.2%	—	64.3%	58.6%
MCP Atlas (Multi-step workflows using MCP)	83.6%	62%	78.2%	69.5%	79.1%	75.3%
Toolathlon (Real-world general tool use)	56.5%	49.4%	—	—	—	55.6%
OSWorld-Verified (Agentic computer use)	78.4%	65.1%	76.2%	72.5%	78%	78.7%
Finance Agent v2 (Financial analysis and decision-making)	57.9%	42.6%	43%	51%	51.5%	51.8%
GDPval-AA (Economically valuable knowledge work)	1656 Elo	1204 Elo	1314 Elo	1676 Elo	1753 Elo	1769 Elo
CharXiv (Information synthesis from complex charts, no tools)	84.2%	80.3%	83.3%	72.4%	82.1%	84.1%
MMMU-Pro (Multimodal understanding and reasoning, no tools)	83.6%	81.2%	80.5%	74.5%	75.2%	81.2%
Blueprint-Bench 2 (Agentic spatial reasoning, normalized score)	33.6%	0%	26.5%	6.7%	24.5%	36.2%
MRCR v2 (8-needle) (Long context 128k, average)	77.3%	67.2%	84.9%	84.9%	59.3%	94.8%
MRCR v2 (1M) (Long context 1M, pointwise)	26.6%	22.1%	26.3%	—	—	—
Humanity's Last Exam (Academic reasoning, full set, text + MM)	40.2%	33.7%	44.4%	33.2%	46.9%	41.4%
ARC-AGI-2 (Abstract reasoning puzzles)	72.1%	33.6%	77.1%	58.3%	75.8%	84.6%

Comparison source ↗

This model's scores

Terminal-Bench 2.1 (agentic coding)76.2%
SWE-Bench Pro55.1%
MCP Atlas (multi-step workflows)83.6%
MMMU-Pro (multimodal reasoning)83.6%
ARC-AGI-272.1%
CharXiv Reasoning (chart understanding)84.2%
Artificial Analysis Intelligence Index55%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$1.50 / 1M tokens per 1M tokens
Cached input	$0.15 / 1M tokens per 1M tokens
Output	$9.00 / 1M tokens per 1M tokens

Paid tier; output price includes thinking tokens. Context caching also incurs $1.00 per 1M tokens per hour of storage. A free tier is available with usage limits.

Pricing source ↗

Strengths

Frontier agentic and coding performance that surpasses Gemini 3.1 Pro on Terminal-Bench 2.1, MCP Atlas, and Finance Agent v2
Very high output speed (Artificial Analysis measured ~280+ tokens/sec, roughly 70% faster than Gemini 3 Flash)
1 million-token context window for long-document and multi-file reasoning
Native multimodal input across text, image, audio, video, and PDF
Controllable reasoning via thinking_level plus dynamic compute allocation
Built-in tool use: function calling, structured output, code execution, and search grounding

Best for

Coding agents and autonomous multi-step software tasks (e.g. terminal and SWE-style workflows)
Tool-using and MCP-driven agents that need fast, cheap iterations
Long-context document analysis and summarization across up to 1M tokens
Multimodal understanding of charts, images, audio, and video
High-volume production workloads where latency and cost matter
Structured data extraction and function-calling pipelines

How to access

Provider	Model ID
Google Gemini API ↗	`gemini-3.5-flash`
Google Vertex AI ↗	`gemini-3.5-flash`
OpenRouter ↗	`google/gemini-3.5-flash`

Gemini Flash — every version

The full lineage of the Gemini Flash line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Gemini 3.5 Flashcurrent	2026-05-19	—	Proprietary
Gemini 3 Flash	2025-12-17	—	Proprietary
Gemini 2.5 Flash	2025-04-17	—	Proprietary
Gemini 2.0 Flash	2025-01-30	—	Proprietary
Gemini 1.5 Flash	2024-05-14	—	Proprietary

FAQ

When was Gemini 3.5 Flash released?

Google announced Gemini 3.5 Flash at Google I/O 2026 and made it generally available on May 19, 2026, as the API model gemini-3.5-flash.

What is the context window and output limit of Gemini 3.5 Flash?

It has a 1 million-token input context window and can produce up to about 64K output tokens. It accepts text, images, audio, video, and PDFs as input and returns text.

How much does Gemini 3.5 Flash cost?

On the paid tier it is $1.50 per 1M input tokens and $9.00 per 1M output tokens (output includes thinking tokens), with cached input at $0.15 per 1M tokens. A free tier with usage limits is also available.

How does Gemini 3.5 Flash compare to Gemini 3.1 Pro?

It is the first Flash-tier model to beat a previous Pro tier on Google's published benchmarks, outperforming Gemini 3.1 Pro on Terminal-Bench 2.1, MCP Atlas, and Finance Agent v2 while running roughly four times faster.

// Overview

// Benchmarks

This model's scores

// Pricing

// Strengths

// Best for

// How to access

// Gemini Flash — every version

// FAQ