Overview
Gemma 1 is the first generation of Google's open-weights large language model family, announced February 21, 2024 and developed by Google DeepMind. It was released in two sizes — Gemma 2B and Gemma 7B — each available as a pre-trained base model and an instruction-tuned (IT) chat variant. Google positioned Gemma as a lightweight, openly distributed counterpart to its closed Gemini models, sharing Gemini's tokenizer, 256,128-token vocabulary, and core architectural research while shrinking the weights down to a size that runs on a single consumer GPU, a workstation, or a laptop.
Architecturally, Gemma 1 is a decoder-only transformer with an 8,192-token context window, RoPE embeddings, GeGLU activations, and RMSNorm. The 7B model uses standard multi-head attention; the 2B model uses multi-query attention and was distilled from the larger model. The 2B variant has about 2.5 billion total parameters and the 7B about 8.5 billion (the headline names refer to non-embedding parameter counts). Google trained the 2B model on roughly 3 trillion tokens and the 7B on roughly 6 trillion tokens of mostly-English web text, code, and math, using TPUv5e with JAX. The instruction-tuned variants were aligned with supervised fine-tuning and reinforcement learning from human feedback.
At launch Gemma was distributed under the custom Gemma Terms of Use — a source-available license that permits responsible commercial use and redistribution but carries a Prohibited Use Policy and flow-down obligations, rather than the Apache 2.0 license Google later adopted for newer Gemma generations. Google shipped toolchain support across JAX, PyTorch, and Keras 3.0, plus Hugging Face, NVIDIA NeMo, and TensorRT-LLM, and made the weights available through Kaggle, Hugging Face, and Vertex AI. An April 5, 2024 maintenance update introduced Gemma 1.1, with improved instruction-tuned checkpoints. Gemma 1 has since been superseded by Gemma 2, Gemma 3, and Gemma 4.
| Released | 2024-02-21 |
|---|---|
| License | Gemma Terms of Use (custom source-available license; permits responsible commercial use and redistribution, with a Prohibited Use Policy and flow-down obligations). Not Apache 2.0 — that change came with later generations. |
| Weights | Open weights |
| Parameters | 2B (~2.5B total) and 7B (~8.5B total) |
| Context | 8,192 tokens |
| Max output | Not separately specified; output shares the 8,192-token context window. |
| Architecture | Decoder-only transformer, built on the same research, vocabulary, and tokenizer as Gemini. Uses RoPE positional embeddings, GeGLU activations, and RMSNorm. The 7B model uses multi-head attention; the smaller 2B uses multi-query attention (and is distilled from the 7B). 256,128-token SentencePiece vocabulary. Trained on a context length of 8,192 tokens — 2B on roughly 3 trillion tokens and 7B on roughly 6 trillion tokens of primarily English web documents, code, and mathematics, on TPUv5e with JAX and ML Pathways. |
| Knowledge cutoff | Not officially published by Google for Gemma 1. |
| Modalities | text |
| Status | Superseded. Gemma 1 (2B/7B, v1.0 and the April 2024 v1.1 update) was Google's first open-weights generation and remains downloadable, but it has been succeeded by Gemma 2, Gemma 3, and Gemma 4. New projects should use a later Gemma generation; Gemma 1 weights stay available for reproducibility and legacy use. |
Benchmarks
- MMLU (5-shot)64.3%
- HellaSwag (0-shot)81.2%
- HumanEval (pass@1)32.3%
- GSM8K (maj@1)46.4%
- MATH (4-shot)24.3%
- ARC-c53.2%
- PIQA (0-shot)81.2%
- BoolQ (0-shot)83.2%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Strengths
- First Google open-weights LLM — downloadable weights you can self-host, fine-tune, and deploy commercially under the Gemma Terms of Use
- Small enough to run on a single consumer GPU, workstation, or laptop (2B and 7B sizes)
- Built on Gemini research: shared tokenizer, 256k vocabulary, and architecture, distilled into open weights
- Strong benchmark results for its size class at launch, beating similarly-sized open models on 11 of 18 text tasks per the technical report
- Broad framework support out of the box: JAX, PyTorch, Keras 3.0, Hugging Face, NeMo, TensorRT-LLM
- Both pre-trained base and instruction-tuned chat variants for each size
Best for
- Running a capable open LLM locally for privacy-sensitive or offline workloads
- Fine-tuning a small open base model on proprietary data with LoRA/QLoRA
- Text generation, summarization, and chatbot prototyping on modest hardware
- On-device and edge deployments where a 2B model fits the compute budget
- Research and reproducibility work that needs open weights rather than an API
- Cost-controlled self-hosted inference as an alternative to closed hosted APIs
How to access
| Provider | Model ID |
|---|---|
| Hugging Face (google/gemma-7b) ↗ | google/gemma-7b |
| Hugging Face (google/gemma-2b) ↗ | google/gemma-2b |
| Google AI for Developers (Gemma) ↗ | gemma-7b |
Gemma (open weights) — every version
The full lineage of the Gemma (open weights) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
FAQ
What is Gemma 1 and who made it?
Gemma 1 is Google DeepMind's first family of open-weights large language models, announced on February 21, 2024. It came in two sizes — 2B and 7B — each with a pre-trained base model and an instruction-tuned chat variant, and was built using the same research and tokenizer as Google's closed Gemini models.
Is Gemma 1 free and open source?
The weights are openly downloadable and free to use, including for commercial products, but Gemma 1 is distributed under the custom Gemma Terms of Use rather than a standard open-source license like Apache 2.0. That license adds a Prohibited Use Policy and flow-down obligations, so it is best described as 'open weights' / source-available rather than OSI open source. Later Gemma generations moved to Apache 2.0.
What context window does Gemma 1 support?
Gemma 1 was trained and serves with an 8,192-token context window for both the 2B and 7B sizes. Later Gemma generations expanded this substantially.
Should I still use Gemma 1 today?
For most new work, no — Gemma 1 has been superseded by Gemma 2, Gemma 3, and Gemma 4, which are stronger and more efficient. Gemma 1 weights remain available for reproducibility, legacy deployments, and research that specifically targets the original release.