EmbeddingGemma (Gemma)

A 308M multilingual embedding model small enough to run on-device

github.com/google-deepmind/gemma★ 5.5k ai.google.dev/gemma/docs/embeddinggemma

Overview

EmbeddingGemma is a 308M-parameter text embedding model from Google DeepMind, based on Gemma 3. It turns text into dense vectors that you can use for semantic search, retrieval-augmented generation (RAG), clustering, and similarity comparison. It was trained on data covering over 100 languages, so it works across multilingual content.

Because it is small, it is meant to run on-device or on modest hardware (CPU, GPU, or TPU) rather than requiring a large server. It also supports Matryoshka Representation Learning, which lets you shorten the output vector (for example from 768 down to 128 dimensions) to trade a little accuracy for lower storage and faster search.

It fits the embedding-models category: you generate embeddings once for your documents, store them in a vector index, and compare them against query embeddings at search time. The recommended way to run it is through the sentence-transformers library using the open weights published on Hugging Face.

What it does

308M-parameter embedding model based on Gemma 3, suited to on-device and CPU/GPU/TPU use
Trained on data spanning over 100 languages for multilingual text
Matryoshka Representation Learning: truncate output dimensions from 768 down to 128 to save storage
Open weights licensed for commercial use, published on Hugging Face as google/embeddinggemma-300M
Runs through the standard sentence-transformers API with model.encode() and model.similarity()
Built for RAG, semantic search, clustering, and similarity tasks

Getting started

EmbeddingGemma is loaded through the sentence-transformers library. Install the dependencies, then load the model from Hugging Face and call encode().

Install sentence-transformers

Install sentence-transformers along with the transformers build that supports EmbeddingGemma.

bashbash

pip install -U sentence-transformers git+https://github.com/huggingface/transformers@v4.56.0-Embedding-Gemma-preview

Generate embeddings

Load the model by its Hugging Face ID and call encode() on your text. Use similarity() to compare two embeddings.

pythonpython

import torch
from sentence_transformers import SentenceTransformer

device = "cuda" if torch.cuda.is_available() else "cpu"
model = SentenceTransformer("google/embeddinggemma-300M").to(device=device)

texts = ["apple", "banana", "car"]
embeddings = model.encode(texts)

similarities = model.similarity(embeddings[0], embeddings[1])

Shrink the embedding dimension (optional)

Pass truncate_dim to get shorter vectors via Matryoshka Representation Learning, which reduces storage and speeds up search.

pythonpython

embeddings = model.encode(texts, truncate_dim=512)

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Build a RAG pipeline: embed your documents, store them in a vector index, and retrieve the closest chunks for a user query
Add semantic search to an app so results match meaning rather than exact keywords
Run embeddings on-device or on a CPU where a larger model would be too heavy
Embed multilingual content for cross-language search, clustering, or deduplication

How EmbeddingGemma (Gemma) compares

EmbeddingGemma (Gemma) alongside other open-source embedding models & inference tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Sentence Transformers	★ 18.8k	The standard Python framework for loading, training, and computing embeddings with sentence and reranking models.
EmbeddingGemma (Gemma)	★ 5.5k	A 308M multilingual embedding model small enough to run on-device
Text Embeddings Inference (TEI)	★ 4.9k	Hugging Face's Rust-based server for deploying embedding, reranking, and sequence-classification models with high throughput on GPU or CPU.
Infinity (Embeddings)	★ 2.8k	A high-throughput serving engine for text embeddings, rerankers, CLIP, and ColPali models, exposing an OpenAI-compatible API.
ColPali	★ 2.7k	A vision-language embedding model that indexes whole document page images for retrieval, avoiding the need to parse PDFs into text first.
Model2Vec	★ 2.1k	A tool that distills any sentence transformer into a tiny, fast static embedding model (the Potion models) that runs on CPU without a neural network at inference.
Instructor Embedding	★ 2k	Instruction-tuned text embedding models that let you tailor embeddings to a task by prepending a natural-language instruction.
Qwen3-Embedding	★ 2k	Alibaba's open embedding and reranking models built on the Qwen3 base, available in 0.6B/4B/8B sizes and covering over 100 languages.

// Overview

// What it does

// Getting started