Overview
Model2Vec is a Python library that turns a regular sentence transformer into a small, static embedding model. Instead of running a neural network for every input, a static model looks up pre-computed token embeddings, which makes it much smaller and faster while keeping most of the quality. The maintainers report size reductions of up to 50x and speed-ups of up to 500x, with a small drop in performance.
It is aimed at developers who need text embeddings but want to run them on CPU without the overhead of a full transformer. You can use the ready-made Potion models from the Hugging Face hub, or distill your own static model from a sentence transformer in about 30 seconds on a CPU.
As an embedding model tool, it fits tasks like text classification, retrieval, clustering, and building RAG systems. The embeddings it produces are standard vectors you can feed into any of those pipelines.
What it does
- Loads pre-trained Potion static models directly from the Hugging Face hub, ready to use
- Distills your own static model from any sentence transformer in about 30 seconds on a CPU
- Runs on CPU without a neural network at inference, so embeddings are small and fast
- Produces both pooled embeddings and per-token embedding sequences via encode and encode_as_sequence
- Optional training extra lets you fine-tune classification models on top of a distilled or pre-trained model
- Supports BPE and Unigram tokenizer backends, plus quantization and dimensionality reduction to shrink models further
Getting started
Install the base package, then load a Potion model and create embeddings. If you want to build your own static model, install the distillation extra.
Install the base package
Install the lightweight base package with pip.
pip install model2vecLoad a model and make embeddings
Load a pre-trained Potion model from the Hugging Face hub and encode some text.
from model2vec import StaticModel
# Load a model from the HuggingFace hub (in this case the potion-base-32M model)
model = StaticModel.from_pretrained("minishlab/potion-base-32M")
# Make embeddings
embeddings = model.encode(["It's dangerous to go alone!", "It's a secret to everybody."])
# Make sequences of token embeddings
token_embeddings = model.encode_as_sequence(["It's dangerous to go alone!", "It's a secret to everybody."])Distill your own model (optional)
Install the distillation extra, then distill a static model from a sentence transformer in about 30 seconds on a CPU.
pip install model2vec[distill]Run the distillation
Distill a sentence transformer and save the resulting static model.
from model2vec.distill import distill
# Distill a Sentence Transformer model, in this case the BAAI/bge-base-en-v1.5 model
m2v_model = distill(model_name="BAAI/bge-base-en-v1.5")
# Save the model
m2v_model.save_pretrained("m2v_model")Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Add fast, CPU-only text embeddings to a retrieval or RAG pipeline without hosting a full transformer
- Shrink an existing sentence transformer into a smaller static model for resource-constrained or edge deployments
- Generate vectors for text classification and clustering at high throughput
- Embed multilingual text using the pre-trained potion-multilingual-128M model
How Model2Vec compares
Model2Vec alongside other open-source embedding models & inference tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Sentence Transformers | ★ 18.8k | The standard Python framework for loading, training, and computing embeddings with sentence and reranking models. |
| EmbeddingGemma (Gemma) | ★ 5.5k | Google DeepMind's Gemma repo, home to EmbeddingGemma, a 308M multilingual embedding model small enough to run on-device for RAG and semantic search. |
| Text Embeddings Inference (TEI) | ★ 4.9k | Hugging Face's Rust-based server for deploying embedding, reranking, and sequence-classification models with high throughput on GPU or CPU. |
| Infinity (Embeddings) | ★ 2.8k | A high-throughput serving engine for text embeddings, rerankers, CLIP, and ColPali models, exposing an OpenAI-compatible API. |
| ColPali | ★ 2.7k | A vision-language embedding model that indexes whole document page images for retrieval, avoiding the need to parse PDFs into text first. |
| Model2Vec | ★ 2.1k | Distill any sentence transformer into a tiny, fast static embedding model |
| Instructor Embedding | ★ 2k | Instruction-tuned text embedding models that let you tailor embeddings to a task by prepending a natural-language instruction. |
| Qwen3-Embedding | ★ 2k | Alibaba's open embedding and reranking models built on the Qwen3 base, available in 0.6B/4B/8B sizes and covering over 100 languages. |