Overview
Gemma 3n is Google's mobile-first open-weights model, released in full on June 26, 2025 after a May 2025 preview. It belongs to the Gemma family derived from Gemini research and is built specifically to run on phones, laptops and tablets rather than in the data center. Gemma 3n ships in two sizes: E2B and E4B. The 'E' stands for effective parameters — E2B has roughly 5B raw parameters but runs with the memory footprint of a 2B model (about 2 GB), and E4B has about 8B raw parameters but the footprint of a 4B model (about 3 GB).
What makes Gemma 3n unusual is its architecture. It uses a MatFormer (Matryoshka Transformer) design that nests a smaller, fully functional sub-model inside the larger one, so developers can trade quality for speed without swapping models. Per-Layer Embeddings (PLE) let much of the embedding data stay in fast storage instead of accelerator memory, cutting the active footprint further. A MobileNet-V5 vision encoder and a Universal Speech Model-based audio encoder make it natively multimodal: it accepts text, image, audio and video input and produces text output, including on-device speech-to-text and translation without an internet connection.
Gemma 3n was trained on roughly 11 trillion tokens with a knowledge cutoff of June 2024, supports text across 140 languages, and offers a 32K-token context window. The instruction-tuned E4B was the first model under 10B parameters to pass an LMArena Elo of 1300. Weights for both sizes are freely downloadable from Hugging Face and Kaggle under the Gemma Terms of Use, which permits responsible commercial use.
| Released | 2025-06-26 |
|---|---|
| License | Gemma Terms of Use (open weights, commercial use permitted) |
| Weights | Open weights |
| Parameters | E2B: 5B raw / ~2B effective; E4B: 8B raw / ~4B effective |
| Context | 32K |
| Max output | 32K tokens (shared with input) |
| Architecture | MatFormer (Matryoshka Transformer) with nested sub-models for elastic inference, Per-Layer Embeddings (PLE) for memory efficiency, a MobileNet-V5 vision encoder, and a Universal Speech Model (USM)-based audio encoder, plus LAuReL and AltUp optimizations. Trained on ~11 trillion tokens. |
| Knowledge cutoff | June 2024 |
| Modalities | Text, Vision, Audio, Video |
| Status | Available |
Benchmarks
- MMLU (E4B, 0-shot)64.9%
- HellaSwag (E4B, 10-shot)78.6%
- HumanEval (E4B, pass@1)75%
- MBPP (E4B, 3-shot pass@1)63.6%
- ARC-c (E4B, 25-shot)61.6%
- BIG-Bench Hard (E4B, few-shot)52.9%
- MMLU (E2B, 0-shot)60.1%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.06 / 1M tokens per 1M tokens |
|---|---|
| Output | $0.12 / 1M tokens per 1M tokens |
Weights are free to download and self-host under the Gemma Terms of Use; the listed price is third-party hosted inference for Gemma 3n E4B on OpenRouter. Other providers (e.g. Together AI) list input as low as $0.02 / 1M tokens.
Strengths
- Runs locally on phones, laptops and tablets with as little as 2 GB (E2B) or 3 GB (E4B) of memory
- Natively multimodal: accepts text, image, audio and video input in a single model
- On-device speech-to-text and speech translation with no internet connection required
- MatFormer architecture lets one model serve multiple quality/latency points (Mix-n-Match)
- Open weights under a commercially permissive license, downloadable from Hugging Face and Kaggle
- Strong coding scores for its size (HumanEval 75.0 on E4B)
- 140-language text coverage with multimodal understanding across 35 languages
Best for
- On-device assistants and chat apps that work offline
- Real-time speech transcription and translation on mobile hardware
- Image and short-video understanding in privacy-sensitive apps
- Edge and IoT deployments where cloud inference is too costly or too slow
- Prototyping multimodal features without per-token API fees
- Fine-tuning a compact open model for a domain-specific task
How to access
| Provider | Model ID |
|---|---|
| OpenRouter ↗ | google/gemma-3n-e4b-it |
| Hugging Face ↗ | google/gemma-3n-E4B-it |
Gemma (open weights) — every version
The full lineage of the Gemma (open weights) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
FAQ
What does the 'n' in Gemma 3n mean and what are E2B and E4B?
The 'n' marks the mobile/on-device branch of the Gemma 3 family. It comes in two sizes, E2B and E4B, where 'E' stands for effective parameters. E2B has about 5B raw parameters but runs with the memory footprint of a 2B model (around 2 GB), and E4B has about 8B raw parameters with the footprint of a 4B model (around 3 GB), thanks to Per-Layer Embeddings and the MatFormer architecture.
What inputs can Gemma 3n handle?
Gemma 3n is natively multimodal. It accepts text, image, audio and video input and produces text output. Its built-in audio encoder enables on-device speech-to-text and speech translation, and a MobileNet-V5 vision encoder handles images and short video — all without requiring an internet connection.
Is Gemma 3n free and can I use it commercially?
The weights for both E2B and E4B are free to download from Hugging Face and Kaggle and can be self-hosted. They are released under the Gemma Terms of Use, which permits responsible commercial use. (Note: only Gemma 4, released in 2026, switched to the Apache 2.0 license — Gemma 3n uses the custom Gemma license.) Third-party providers also host it for a per-token fee.
How large is the context window?
Gemma 3n has a 32K-token context window, and the maximum output of up to 32K tokens is shared with the input. Its training data has a knowledge cutoff of June 2024.