Gemma 1

Name: Gemma 1
Author: Google

Google's first open-weights model family, distilled from Gemini research into 2B and 7B sizes you can run on a single GPU.

Overview

Gemma 1 is the first generation of Google's open-weights large language model family, announced February 21, 2024 and developed by Google DeepMind. It was released in two sizes — Gemma 2B and Gemma 7B — each available as a pre-trained base model and an instruction-tuned (IT) chat variant. Google positioned Gemma as a lightweight, openly distributed counterpart to its closed Gemini models, sharing Gemini's tokenizer, 256,128-token vocabulary, and core architectural research while shrinking the weights down to a size that runs on a single consumer GPU, a workstation, or a laptop.

Architecturally, Gemma 1 is a decoder-only transformer with an 8,192-token context window, RoPE embeddings, GeGLU activations, and RMSNorm. The 7B model uses standard multi-head attention; the 2B model uses multi-query attention and was distilled from the larger model. The 2B variant has about 2.5 billion total parameters and the 7B about 8.5 billion (the headline names refer to non-embedding parameter counts). Google trained the 2B model on roughly 3 trillion tokens and the 7B on roughly 6 trillion tokens of mostly-English web text, code, and math, using TPUv5e with JAX. The instruction-tuned variants were aligned with supervised fine-tuning and reinforcement learning from human feedback.

At launch Gemma was distributed under the custom Gemma Terms of Use — a source-available license that permits responsible commercial use and redistribution but carries a Prohibited Use Policy and flow-down obligations, rather than the Apache 2.0 license Google later adopted for newer Gemma generations. Google shipped toolchain support across JAX, PyTorch, and Keras 3.0, plus Hugging Face, NVIDIA NeMo, and TensorRT-LLM, and made the weights available through Kaggle, Hugging Face, and Vertex AI. An April 5, 2024 maintenance update introduced Gemma 1.1, with improved instruction-tuned checkpoints. Gemma 1 has since been superseded by Gemma 2, Gemma 3, and Gemma 4.

Released	2024-02-21
License	Gemma Terms of Use (custom source-available license; permits responsible commercial use and redistribution, with a Prohibited Use Policy and flow-down obligations). Not Apache 2.0 — that change came with later generations.
Weights	Open weights
Parameters	2B (~2.5B total) and 7B (~8.5B total)
Context	8,192 tokens
Max output	Not separately specified; output shares the 8,192-token context window.
Architecture	Decoder-only transformer, built on the same research, vocabulary, and tokenizer as Gemini. Uses RoPE positional embeddings, GeGLU activations, and RMSNorm. The 7B model uses multi-head attention; the smaller 2B uses multi-query attention (and is distilled from the 7B). 256,128-token SentencePiece vocabulary. Trained on a context length of 8,192 tokens — 2B on roughly 3 trillion tokens and 7B on roughly 6 trillion tokens of primarily English web documents, code, and mathematics, on TPUv5e with JAX and ML Pathways.
Knowledge cutoff	Not officially published by Google for Gemma 1.
Modalities	text
Status	Superseded. Gemma 1 (2B/7B, v1.0 and the April 2024 v1.1 update) was Google's first open-weights generation and remains downloadable, but it has been succeeded by Gemma 2, Gemma 3, and Gemma 4. New projects should use a later Gemma generation; Gemma 1 weights stay available for reproducibility and legacy use.

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Strengths

First Google open-weights LLM — downloadable weights you can self-host, fine-tune, and deploy commercially under the Gemma Terms of Use
Small enough to run on a single consumer GPU, workstation, or laptop (2B and 7B sizes)
Built on Gemini research: shared tokenizer, 256k vocabulary, and architecture, distilled into open weights
Strong benchmark results for its size class at launch, beating similarly-sized open models on 11 of 18 text tasks per the technical report
Broad framework support out of the box: JAX, PyTorch, Keras 3.0, Hugging Face, NeMo, TensorRT-LLM
Both pre-trained base and instruction-tuned chat variants for each size

Best for

Running a capable open LLM locally for privacy-sensitive or offline workloads
Fine-tuning a small open base model on proprietary data with LoRA/QLoRA
Text generation, summarization, and chatbot prototyping on modest hardware
On-device and edge deployments where a 2B model fits the compute budget
Research and reproducibility work that needs open weights rather than an API
Cost-controlled self-hosted inference as an alternative to closed hosted APIs

How to access

Provider	Model ID
Hugging Face (google/gemma-7b) ↗	`google/gemma-7b`
Hugging Face (google/gemma-2b) ↗	`google/gemma-2b`
Google AI for Developers (Gemma) ↗	`gemma-7b`

Gemma (open weights) — every version

The full lineage of the Gemma (open weights) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Gemma 4current	2026-04-02	—	Apache-2.0
Gemma 3n	2025-06-26	—	Open weights
Gemma 3	2025-03-12	—	Open weights
Gemma 2	2024-06-27	—	Open weights
Gemma 1	2024-02-21	—	Open weights

FAQ

What is Gemma 1 and who made it?

Gemma 1 is Google DeepMind's first family of open-weights large language models, announced on February 21, 2024. It came in two sizes — 2B and 7B — each with a pre-trained base model and an instruction-tuned chat variant, and was built using the same research and tokenizer as Google's closed Gemini models.

Is Gemma 1 free and open source?

The weights are openly downloadable and free to use, including for commercial products, but Gemma 1 is distributed under the custom Gemma Terms of Use rather than a standard open-source license like Apache 2.0. That license adds a Prohibited Use Policy and flow-down obligations, so it is best described as 'open weights' / source-available rather than OSI open source. Later Gemma generations moved to Apache 2.0.

What context window does Gemma 1 support?

Gemma 1 was trained and serves with an 8,192-token context window for both the 2B and 7B sizes. Later Gemma generations expanded this substantially.

Should I still use Gemma 1 today?

For most new work, no — Gemma 1 has been superseded by Gemma 2, Gemma 3, and Gemma 4, which are stronger and more efficient. Gemma 1 weights remain available for reproducibility, legacy deployments, and research that specifically targets the original release.

// Overview

// Benchmarks

// Strengths

// Best for

// How to access

// Gemma (open weights) — every version

// FAQ