Overview
FastEmbed is a Python library from Qdrant for generating text and image embeddings. Instead of pulling in large PyTorch dependencies, it runs models through the ONNX Runtime, which keeps the install small and works without a GPU. That makes it a good fit for serverless environments like AWS Lambda where size and cold-start time matter.
It is aimed at developers building retrieval and search systems. The same library covers the main embedding types you need for hybrid retrieval: dense vectors (the default is BAAI/bge-small-en-v1.5), sparse vectors with SPLADE++, and late-interaction models like ColBERT. It also supports image embeddings, ColPali-style multimodal late interaction, and cross-encoder rerankers.
In the RAG and retrieval category, FastEmbed sits at the embedding and reranking layer. It pairs directly with Qdrant for storing and querying vectors, but the embeddings it produces are plain NumPy arrays you can use with any vector store.
What it does
- Dense text embeddings via TextEmbedding, with a list of supported models and the ability to add custom ONNX models
- Sparse text embeddings using SPLADE++ (SparseTextEmbedding) for keyword-aware hybrid search
- Late-interaction embeddings with ColBERT (LateInteractionTextEmbedding) for fine-grained matching
- Image and multimodal embeddings, including CLIP image vectors and ColPali late-interaction multimodal
- Cross-encoder rerankers (TextCrossEncoder) to re-score query-document pairs
- Runs on ONNX Runtime with few dependencies and no required GPU, suitable for serverless deployments
Getting started
Install the package with pip, then load a model and embed a list of documents.
Install FastEmbed
Install with pip. Use the fastembed-gpu package if you want GPU support.
pip install fastembed
# or with GPU support
pip install fastembed-gpuGenerate dense embeddings
Create a TextEmbedding model and call embed() on your documents. The first run downloads the default model (BAAI/bge-small-en-v1.5).
from fastembed import TextEmbedding
documents: list[str] = [
"This is built to be faster and lighter than other embedding libraries e.g. Transformers, Sentence-Transformers, etc.",
"fastembed is supported by and maintained by Qdrant.",
]
embedding_model = TextEmbedding()
embeddings_list = list(embedding_model.embed(documents))
len(embeddings_list[0]) # Vector of 384 dimensionsAdd sparse or reranker models
For hybrid search, pull in SPLADE++ sparse vectors or a cross-encoder reranker. Each uses its own class.
from fastembed import SparseTextEmbedding
from fastembed.rerank.cross_encoder import TextCrossEncoder
sparse = SparseTextEmbedding(model_name="prithivida/Splade_PP_en_v1")
sparse_embeddings = list(sparse.embed(documents))
encoder = TextCrossEncoder(model_name="Xenova/ms-marco-MiniLM-L-6-v2")
scores = list(encoder.rerank("Who is maintaining Qdrant?", documents))Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Generating dense vectors for documents and queries to power semantic search in a vector store
- Building hybrid search that combines dense vectors with SPLADE++ sparse vectors
- Re-scoring top retrieval results with a cross-encoder reranker to improve ranking quality
- Running embeddings in size-constrained or serverless environments without GPUs or large PyTorch installs
How FastEmbed compares
FastEmbed alongside other open-source rerank, search & hybrid tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Elasticsearch | ★ 77.1k | Distributed search and analytics engine with a built-in vector database for dense/sparse embeddings and hybrid keyword-plus-semantic retrieval. |
| Meilisearch Cloud | ★ 58.2k | Managed cloud for the Meilisearch engine, combining fast full-text search with hybrid, semantic, and multimodal vector search. |
| Typesense Cloud | ★ 26.1k | Managed hosting for the Typesense search engine, offering typo-tolerant keyword search plus vector and semantic search via a simple API. |
| Tantivy | ★ 15.4k | A fast full-text search engine library in Rust that provides BM25 keyword search for the lexical half of hybrid retrieval. |
| FlagEmbedding | ★ 11.8k | BAAI's retrieval toolkit that provides the BGE embedding and cross-encoder reranker models used widely in RAG pipelines. |
| Vespa | ★ 7k | A search and serving engine that natively combines vector, keyword (BM25), and structured search with built-in ranking for large-scale retrieval. |
| RAGatouille | ★ 3.9k | A wrapper that makes it easy to train and use ColBERT late-interaction retrieval inside RAG pipelines. |
| FastEmbed | ★ 3k | Lightweight Python library for dense, sparse, and late-interaction embeddings |