AI/TLDR

Sentence Transformers

Compute, train, and rerank text embeddings in Python

Overview

Sentence Transformers is a Python framework for computing text embeddings with sentence, reranker (cross-encoder), and sparse encoder models. You load a pretrained model, call encode() on your texts, and get back vectors you can compare with a similarity function. It works with over 15,000 pretrained models on Hugging Face, including many from the MTEB leaderboard.

It is aimed at developers building semantic search, retrieval, and text-similarity features, as well as teams that want to train or finetune their own embedding and reranker models. The same library covers both the dense embeddings used to find candidate documents and the cross-encoder rerankers used to score those candidates more precisely.

Within the embeddings category, it is the standard high-level toolkit for turning text into vectors and reranking results. It also supports sparse encoder models (such as SPLADE) for keyword-style sparse representations, so you can mix dense, sparse, and reranking stages in one pipeline.

What it does

  • Compute dense text embeddings with a one-line model.encode() call that returns numpy arrays
  • Score and rank query-passage pairs with CrossEncoder reranker models via predict() or rank()
  • Generate sparse embeddings with SparseEncoder models like SPLADE, including sparsity stats
  • Built-in similarity helpers via model.similarity() to compare embeddings
  • Access to 15,000+ pretrained models on Hugging Face, covering 100+ languages
  • Train or finetune your own embedding, reranker, and sparse encoder models

Getting started

Install the package, then load a pretrained model and encode some text.

Install

Install from PyPI. Python 3.10+, PyTorch 1.11.0+, and transformers v4.41.0+ are recommended.

bashbash
pip install -U sentence-transformers

Compute embeddings

Load a Sentence Transformer model and encode a list of sentences into vectors.

pythonpython
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# => (3, 384)

Compare similarity

Use the model's similarity helper to compare the embeddings.

pythonpython
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6660, 0.1046],
#         [0.6660, 1.0000, 0.1411],
#         [0.1046, 0.1411, 1.0000]])

Rerank with a Cross Encoder

Load a reranker model and rank passages for a query without manual sorting.

pythonpython
from sentence_transformers import CrossEncoder

model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2")

query = "How many people live in Berlin?"
passages = [
    "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.",
    "Berlin has a yearly total of about 135 million day visitors, making it one of the most-visited cities in the European Union.",
    "In 2013 around 600,000 Berliners were registered in one of the more than 2,300 sport and fitness clubs.",
]
ranks = model.rank(query, passages, return_documents=True)

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Build semantic search over a document collection by embedding the corpus and matching queries by similarity
  • Rerank an initial set of retrieved candidates with a cross-encoder for more accurate top results in a RAG pipeline
  • Measure semantic textual similarity or run paraphrase mining across large text sets
  • Train or finetune a custom embedding or reranker model for a domain-specific dataset

How Sentence Transformers compares

Sentence Transformers alongside other open-source embedding models & inference tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Sentence Transformers★ 18.8kCompute, train, and rerank text embeddings in Python
EmbeddingGemma (Gemma)★ 5.5kGoogle DeepMind's Gemma repo, home to EmbeddingGemma, a 308M multilingual embedding model small enough to run on-device for RAG and semantic search.
Text Embeddings Inference (TEI)★ 4.9kHugging Face's Rust-based server for deploying embedding, reranking, and sequence-classification models with high throughput on GPU or CPU.
Infinity (Embeddings)★ 2.8kA high-throughput serving engine for text embeddings, rerankers, CLIP, and ColPali models, exposing an OpenAI-compatible API.
ColPali★ 2.7kA vision-language embedding model that indexes whole document page images for retrieval, avoiding the need to parse PDFs into text first.
Model2Vec★ 2.1kA tool that distills any sentence transformer into a tiny, fast static embedding model (the Potion models) that runs on CPU without a neural network at inference.
Instructor Embedding★ 2kInstruction-tuned text embedding models that let you tailor embeddings to a task by prepending a natural-language instruction.
Qwen3-Embedding★ 2kAlibaba's open embedding and reranking models built on the Qwen3 base, available in 0.6B/4B/8B sizes and covering over 100 languages.