FastEmbed

Lightweight Python library for dense, sparse, and late-interaction embeddings

github.com/qdrant/fastembed★ 3k qdrant.github.io/fastembed

Overview

FastEmbed is a Python library from Qdrant for generating text and image embeddings. Instead of pulling in large PyTorch dependencies, it runs models through the ONNX Runtime, which keeps the install small and works without a GPU. That makes it a good fit for serverless environments like AWS Lambda where size and cold-start time matter.

It is aimed at developers building retrieval and search systems. The same library covers the main embedding types you need for hybrid retrieval: dense vectors (the default is BAAI/bge-small-en-v1.5), sparse vectors with SPLADE++, and late-interaction models like ColBERT. It also supports image embeddings, ColPali-style multimodal late interaction, and cross-encoder rerankers.

In the RAG and retrieval category, FastEmbed sits at the embedding and reranking layer. It pairs directly with Qdrant for storing and querying vectors, but the embeddings it produces are plain NumPy arrays you can use with any vector store.

What it does

Dense text embeddings via TextEmbedding, with a list of supported models and the ability to add custom ONNX models
Sparse text embeddings using SPLADE++ (SparseTextEmbedding) for keyword-aware hybrid search
Late-interaction embeddings with ColBERT (LateInteractionTextEmbedding) for fine-grained matching
Image and multimodal embeddings, including CLIP image vectors and ColPali late-interaction multimodal
Cross-encoder rerankers (TextCrossEncoder) to re-score query-document pairs
Runs on ONNX Runtime with few dependencies and no required GPU, suitable for serverless deployments

Getting started

Install the package with pip, then load a model and embed a list of documents.

Install FastEmbed

Install with pip. Use the fastembed-gpu package if you want GPU support.

bashbash

pip install fastembed

# or with GPU support

pip install fastembed-gpu

Generate dense embeddings

Create a TextEmbedding model and call embed() on your documents. The first run downloads the default model (BAAI/bge-small-en-v1.5).

pythonpython

from fastembed import TextEmbedding

documents: list[str] = [
    "This is built to be faster and lighter than other embedding libraries e.g. Transformers, Sentence-Transformers, etc.",
    "fastembed is supported by and maintained by Qdrant.",
]

embedding_model = TextEmbedding()
embeddings_list = list(embedding_model.embed(documents))
len(embeddings_list[0])  # Vector of 384 dimensions

Add sparse or reranker models

For hybrid search, pull in SPLADE++ sparse vectors or a cross-encoder reranker. Each uses its own class.

pythonpython

from fastembed import SparseTextEmbedding
from fastembed.rerank.cross_encoder import TextCrossEncoder

sparse = SparseTextEmbedding(model_name="prithivida/Splade_PP_en_v1")
sparse_embeddings = list(sparse.embed(documents))

encoder = TextCrossEncoder(model_name="Xenova/ms-marco-MiniLM-L-6-v2")
scores = list(encoder.rerank("Who is maintaining Qdrant?", documents))

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Generating dense vectors for documents and queries to power semantic search in a vector store
Building hybrid search that combines dense vectors with SPLADE++ sparse vectors
Re-scoring top retrieval results with a cross-encoder reranker to improve ranking quality
Running embeddings in size-constrained or serverless environments without GPUs or large PyTorch installs

How FastEmbed compares

FastEmbed alongside other open-source rerank, search & hybrid tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Elasticsearch	★ 77.1k	Distributed search and analytics engine with a built-in vector database for dense/sparse embeddings and hybrid keyword-plus-semantic retrieval.
Meilisearch Cloud	★ 58.2k	Managed cloud for the Meilisearch engine, combining fast full-text search with hybrid, semantic, and multimodal vector search.
Typesense Cloud	★ 26.1k	Managed hosting for the Typesense search engine, offering typo-tolerant keyword search plus vector and semantic search via a simple API.
Tantivy	★ 15.4k	A fast full-text search engine library in Rust that provides BM25 keyword search for the lexical half of hybrid retrieval.
FlagEmbedding	★ 11.8k	BAAI's retrieval toolkit that provides the BGE embedding and cross-encoder reranker models used widely in RAG pipelines.
Vespa	★ 7k	A search and serving engine that natively combines vector, keyword (BM25), and structured search with built-in ranking for large-scale retrieval.
RAGatouille	★ 3.9k	A wrapper that makes it easy to train and use ColBERT late-interaction retrieval inside RAG pipelines.
FastEmbed	★ 3k	Lightweight Python library for dense, sparse, and late-interaction embeddings

// Overview

// What it does

// Getting started