Gemini 1.5 Flash-8B

Name: Gemini 1.5 Flash-8B
Author: Google

Google's smallest, cheapest Gemini 1.5 model — an ~8B multimodal workhorse with a 1M-token context (now retired).

Overview

Gemini 1.5 Flash-8B was Google's smallest and lowest-cost model in the Gemini 1.5 family. Announced in preview during 2024 and made generally available on October 3, 2024 (as gemini-1.5-flash-8b-001), it was a transformer-decoder model derived from Gemini 1.5 Flash — distilled down to roughly 8 billion parameters while keeping the same multimodal capabilities and a context window exceeding 1 million tokens. Google described it as having the lowest cost per intelligence of any Gemini model at the time, and made it the seed of what later became the Gemini Flash-Lite tier.

Despite its tiny size, Gemini 1.5 Flash-8B accepted text, images, audio, and video as input and could read up to about 1,048,576 tokens of context (returning up to 8,192 output tokens). Google's own technical report showed it reaching roughly 80–90% of full Gemini 1.5 Flash quality on many benchmarks, making it well suited to high-volume, latency-sensitive jobs: chat, transcription, long-context translation, large-scale data labeling, and high-throughput agent serving. It launched at half the price of 1.5 Flash with double the rate limits and was available free through Google AI Studio and the Gemini API.

Gemini 1.5 Flash-8B is no longer a current model. Google deprecated the entire Gemini 1.5 line and shut down gemini-1.5-flash-8b on September 24, 2025, after which API calls return errors. Google steered users toward the cheaper, faster successors in the Flash and Flash-Lite lines (Gemini 2.0 Flash / Flash-Lite and later 2.5 Flash-Lite). This page documents it as a historical model.

Released	2024-10-03
License	Proprietary
Weights	API only
Parameters	~8B (single-digit billions; exact count undisclosed)
Context	1M
Max output	8K
Architecture	Transformer decoder model derived from Gemini 1.5 Flash, inheriting the same core architecture, optimizations, and data-mixture refinements at a smaller (~8B parameter) scale, with native multimodal support and a 1M+ token context window. Google did not publish the full parameter count.
Knowledge cutoff	August 2024
Modalities	Text, Vision, Audio, Video
Status	Discontinued (shut down September 24, 2025)

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.0375 / 1M tokens (prompts < 128K) per 1M tokens
Cached input	$0.01 / 1M tokens (prompts < 128K) per 1M tokens
Output	$0.15 / 1M tokens (prompts < 128K) per 1M tokens

Launch pricing from October 2024; prices doubled for prompts longer than 128K tokens. A free tier was available via Google AI Studio and the Gemini API. The model was shut down on September 24, 2025 and is no longer billable.

Pricing source ↗

Strengths

Extremely low cost: at launch it was the cheapest Gemini model, at $0.0375 per 1M input tokens and $0.15 per 1M output tokens for prompts under 128K — half the price of Gemini 1.5 Flash
1M+ token context window despite an ~8B parameter footprint, with quality holding up to very long sequences in Google's long-context tests
Native multimodal input across text, images, audio, and video — unusual for a single-digit-billion-parameter model
High throughput and low latency, with double the rate limits of 1.5 Flash (up to 4,000 requests per minute), aimed at large-scale deployments
Retained roughly 80–90% of full Gemini 1.5 Flash performance on many benchmarks while being far smaller and cheaper
Free access through Google AI Studio and the Gemini API free tier

Best for

High-volume chat, summarization, and classification where cost-per-token dominates
Audio transcription and long-context language translation
Large-scale automated data labeling to bootstrap training sets for other models
High-throughput agent serving and integration into multi-model workflows
Long-document and code analysis within a single 1M-token window

How to access

Provider	Model ID
Google AI Studio / Gemini API ↗	`gemini-1.5-flash-8b`

Gemini Flash-Lite — every version

The full lineage of the Gemini Flash-Lite line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Gemini 3.1 Flash-Litecurrent	2026-03-03	—	Proprietary
Gemini 2.5 Flash-Lite	2025-06-17	—	Proprietary
Gemini 2.0 Flash-Lite	2025-02-01	—	Proprietary
Gemini 1.5 Flash-8B	2024-10-03	—	Proprietary

FAQ

When was Gemini 1.5 Flash-8B released, and is it still available?

Google made Gemini 1.5 Flash-8B (gemini-1.5-flash-8b-001) generally available on October 3, 2024. It is no longer available: Google deprecated the Gemini 1.5 line and shut the model down on September 24, 2025, so API calls now return errors. Google recommends migrating to newer Flash and Flash-Lite models such as Gemini 2.0 Flash.

How big is Gemini 1.5 Flash-8B and what could it do?

It was a roughly 8-billion-parameter transformer-decoder model distilled from Gemini 1.5 Flash. Despite its small size it was natively multimodal — accepting text, images, audio, and video — with a context window exceeding 1 million tokens (about 1,048,576 input tokens) and up to 8,192 output tokens. Google did not publish the exact parameter count.

How much did Gemini 1.5 Flash-8B cost?

At launch it was the cheapest Gemini model: $0.0375 per million input tokens, $0.15 per million output tokens, and $0.01 per million cached input tokens for prompts under 128K. Prices doubled for prompts longer than 128K, and a free tier was available through Google AI Studio and the Gemini API. As of its September 2025 shutdown it is no longer billable.

How well did Gemini 1.5 Flash-8B perform compared to Gemini 1.5 Flash?

Google's technical report showed Flash-8B reaching roughly 80–90% of full Gemini 1.5 Flash quality on many benchmarks. For example it scored 68.1% on MMLU (vs 78.9% for Flash), 50.3% on MMMU, 70.5% on MGSM, and 73.6% on DocVQA — a small quality trade-off in exchange for much higher throughput and lower cost.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// Gemini Flash-Lite — every version

// FAQ