Gemma 3n

Name: Gemma 3n
Author: Google

Google's mobile-first open multimodal model — runs text, image, audio and video on a phone with as little as 2-3 GB of memory.

Overview

Gemma 3n is Google's mobile-first open-weights model, released in full on June 26, 2025 after a May 2025 preview. It belongs to the Gemma family derived from Gemini research and is built specifically to run on phones, laptops and tablets rather than in the data center. Gemma 3n ships in two sizes: E2B and E4B. The 'E' stands for effective parameters — E2B has roughly 5B raw parameters but runs with the memory footprint of a 2B model (about 2 GB), and E4B has about 8B raw parameters but the footprint of a 4B model (about 3 GB).

What makes Gemma 3n unusual is its architecture. It uses a MatFormer (Matryoshka Transformer) design that nests a smaller, fully functional sub-model inside the larger one, so developers can trade quality for speed without swapping models. Per-Layer Embeddings (PLE) let much of the embedding data stay in fast storage instead of accelerator memory, cutting the active footprint further. A MobileNet-V5 vision encoder and a Universal Speech Model-based audio encoder make it natively multimodal: it accepts text, image, audio and video input and produces text output, including on-device speech-to-text and translation without an internet connection.

Gemma 3n was trained on roughly 11 trillion tokens with a knowledge cutoff of June 2024, supports text across 140 languages, and offers a 32K-token context window. The instruction-tuned E4B was the first model under 10B parameters to pass an LMArena Elo of 1300. Weights for both sizes are freely downloadable from Hugging Face and Kaggle under the Gemma Terms of Use, which permits responsible commercial use.

Released	2025-06-26
License	Gemma Terms of Use (open weights, commercial use permitted)
Weights	Open weights
Parameters	E2B: 5B raw / ~2B effective; E4B: 8B raw / ~4B effective
Context	32K
Max output	32K tokens (shared with input)
Architecture	MatFormer (Matryoshka Transformer) with nested sub-models for elastic inference, Per-Layer Embeddings (PLE) for memory efficiency, a MobileNet-V5 vision encoder, and a Universal Speech Model (USM)-based audio encoder, plus LAuReL and AltUp optimizations. Trained on ~11 trillion tokens.
Knowledge cutoff	June 2024
Modalities	Text, Vision, Audio, Video
Status	Available

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.06 / 1M tokens per 1M tokens
Output	$0.12 / 1M tokens per 1M tokens

Weights are free to download and self-host under the Gemma Terms of Use; the listed price is third-party hosted inference for Gemma 3n E4B on OpenRouter. Other providers (e.g. Together AI) list input as low as $0.02 / 1M tokens.

Pricing source ↗

Strengths

Runs locally on phones, laptops and tablets with as little as 2 GB (E2B) or 3 GB (E4B) of memory
Natively multimodal: accepts text, image, audio and video input in a single model
On-device speech-to-text and speech translation with no internet connection required
MatFormer architecture lets one model serve multiple quality/latency points (Mix-n-Match)
Open weights under a commercially permissive license, downloadable from Hugging Face and Kaggle
Strong coding scores for its size (HumanEval 75.0 on E4B)
140-language text coverage with multimodal understanding across 35 languages

Best for

On-device assistants and chat apps that work offline
Real-time speech transcription and translation on mobile hardware
Image and short-video understanding in privacy-sensitive apps
Edge and IoT deployments where cloud inference is too costly or too slow
Prototyping multimodal features without per-token API fees
Fine-tuning a compact open model for a domain-specific task

How to access

Provider	Model ID
OpenRouter ↗	`google/gemma-3n-e4b-it`
Hugging Face ↗	`google/gemma-3n-E4B-it`

Gemma (open weights) — every version

The full lineage of the Gemma (open weights) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Gemma 4current	2026-04-02	—	Apache-2.0
Gemma 3n	2025-06-26	—	Open weights
Gemma 3	2025-03-12	—	Open weights
Gemma 2	2024-06-27	—	Open weights
Gemma 1	2024-02-21	—	Open weights

FAQ

What does the 'n' in Gemma 3n mean and what are E2B and E4B?

The 'n' marks the mobile/on-device branch of the Gemma 3 family. It comes in two sizes, E2B and E4B, where 'E' stands for effective parameters. E2B has about 5B raw parameters but runs with the memory footprint of a 2B model (around 2 GB), and E4B has about 8B raw parameters with the footprint of a 4B model (around 3 GB), thanks to Per-Layer Embeddings and the MatFormer architecture.

What inputs can Gemma 3n handle?

Gemma 3n is natively multimodal. It accepts text, image, audio and video input and produces text output. Its built-in audio encoder enables on-device speech-to-text and speech translation, and a MobileNet-V5 vision encoder handles images and short video — all without requiring an internet connection.

Is Gemma 3n free and can I use it commercially?

The weights for both E2B and E4B are free to download from Hugging Face and Kaggle and can be self-hosted. They are released under the Gemma Terms of Use, which permits responsible commercial use. (Note: only Gemma 4, released in 2026, switched to the Apache 2.0 license — Gemma 3n uses the custom Gemma license.) Third-party providers also host it for a per-token fee.

How large is the context window?

Gemma 3n has a 32K-token context window, and the maximum output of up to 32K tokens is shared with the input. Its training data has a knowledge cutoff of June 2024.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// Gemma (open weights) — every version

// FAQ