Overview
Sentence Transformers is a Python framework for computing text embeddings with sentence, reranker (cross-encoder), and sparse encoder models. You load a pretrained model, call encode() on your texts, and get back vectors you can compare with a similarity function. It works with over 15,000 pretrained models on Hugging Face, including many from the MTEB leaderboard.
It is aimed at developers building semantic search, retrieval, and text-similarity features, as well as teams that want to train or finetune their own embedding and reranker models. The same library covers both the dense embeddings used to find candidate documents and the cross-encoder rerankers used to score those candidates more precisely.
Within the embeddings category, it is the standard high-level toolkit for turning text into vectors and reranking results. It also supports sparse encoder models (such as SPLADE) for keyword-style sparse representations, so you can mix dense, sparse, and reranking stages in one pipeline.
What it does
- Compute dense text embeddings with a one-line model.encode() call that returns numpy arrays
- Score and rank query-passage pairs with CrossEncoder reranker models via predict() or rank()
- Generate sparse embeddings with SparseEncoder models like SPLADE, including sparsity stats
- Built-in similarity helpers via model.similarity() to compare embeddings
- Access to 15,000+ pretrained models on Hugging Face, covering 100+ languages
- Train or finetune your own embedding, reranker, and sparse encoder models
Getting started
Install the package, then load a pretrained model and encode some text.
Install
Install from PyPI. Python 3.10+, PyTorch 1.11.0+, and transformers v4.41.0+ are recommended.
pip install -U sentence-transformersCompute embeddings
Load a Sentence Transformer model and encode a list of sentences into vectors.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# => (3, 384)Compare similarity
Use the model's similarity helper to compare the embeddings.
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6660, 0.1046],
# [0.6660, 1.0000, 0.1411],
# [0.1046, 0.1411, 1.0000]])Rerank with a Cross Encoder
Load a reranker model and rank passages for a query without manual sorting.
from sentence_transformers import CrossEncoder
model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2")
query = "How many people live in Berlin?"
passages = [
"Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.",
"Berlin has a yearly total of about 135 million day visitors, making it one of the most-visited cities in the European Union.",
"In 2013 around 600,000 Berliners were registered in one of the more than 2,300 sport and fitness clubs.",
]
ranks = model.rank(query, passages, return_documents=True)Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Build semantic search over a document collection by embedding the corpus and matching queries by similarity
- Rerank an initial set of retrieved candidates with a cross-encoder for more accurate top results in a RAG pipeline
- Measure semantic textual similarity or run paraphrase mining across large text sets
- Train or finetune a custom embedding or reranker model for a domain-specific dataset
How Sentence Transformers compares
Sentence Transformers alongside other open-source embedding models & inference tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Sentence Transformers | ★ 18.8k | Compute, train, and rerank text embeddings in Python |
| EmbeddingGemma (Gemma) | ★ 5.5k | Google DeepMind's Gemma repo, home to EmbeddingGemma, a 308M multilingual embedding model small enough to run on-device for RAG and semantic search. |
| Text Embeddings Inference (TEI) | ★ 4.9k | Hugging Face's Rust-based server for deploying embedding, reranking, and sequence-classification models with high throughput on GPU or CPU. |
| Infinity (Embeddings) | ★ 2.8k | A high-throughput serving engine for text embeddings, rerankers, CLIP, and ColPali models, exposing an OpenAI-compatible API. |
| ColPali | ★ 2.7k | A vision-language embedding model that indexes whole document page images for retrieval, avoiding the need to parse PDFs into text first. |
| Model2Vec | ★ 2.1k | A tool that distills any sentence transformer into a tiny, fast static embedding model (the Potion models) that runs on CPU without a neural network at inference. |
| Instructor Embedding | ★ 2k | Instruction-tuned text embedding models that let you tailor embeddings to a task by prepending a natural-language instruction. |
| Qwen3-Embedding | ★ 2k | Alibaba's open embedding and reranking models built on the Qwen3 base, available in 0.6B/4B/8B sizes and covering over 100 languages. |