AI/TLDR

FastEmbed

Lightweight Python library for dense, sparse, and late-interaction embeddings

Overview

FastEmbed is a Python library from Qdrant for generating text and image embeddings. Instead of pulling in large PyTorch dependencies, it runs models through the ONNX Runtime, which keeps the install small and works without a GPU. That makes it a good fit for serverless environments like AWS Lambda where size and cold-start time matter.

It is aimed at developers building retrieval and search systems. The same library covers the main embedding types you need for hybrid retrieval: dense vectors (the default is BAAI/bge-small-en-v1.5), sparse vectors with SPLADE++, and late-interaction models like ColBERT. It also supports image embeddings, ColPali-style multimodal late interaction, and cross-encoder rerankers.

In the RAG and retrieval category, FastEmbed sits at the embedding and reranking layer. It pairs directly with Qdrant for storing and querying vectors, but the embeddings it produces are plain NumPy arrays you can use with any vector store.

What it does

  • Dense text embeddings via TextEmbedding, with a list of supported models and the ability to add custom ONNX models
  • Sparse text embeddings using SPLADE++ (SparseTextEmbedding) for keyword-aware hybrid search
  • Late-interaction embeddings with ColBERT (LateInteractionTextEmbedding) for fine-grained matching
  • Image and multimodal embeddings, including CLIP image vectors and ColPali late-interaction multimodal
  • Cross-encoder rerankers (TextCrossEncoder) to re-score query-document pairs
  • Runs on ONNX Runtime with few dependencies and no required GPU, suitable for serverless deployments

Getting started

Install the package with pip, then load a model and embed a list of documents.

Install FastEmbed

Install with pip. Use the fastembed-gpu package if you want GPU support.

bashbash
pip install fastembed

# or with GPU support

pip install fastembed-gpu

Generate dense embeddings

Create a TextEmbedding model and call embed() on your documents. The first run downloads the default model (BAAI/bge-small-en-v1.5).

pythonpython
from fastembed import TextEmbedding

documents: list[str] = [
    "This is built to be faster and lighter than other embedding libraries e.g. Transformers, Sentence-Transformers, etc.",
    "fastembed is supported by and maintained by Qdrant.",
]

embedding_model = TextEmbedding()
embeddings_list = list(embedding_model.embed(documents))
len(embeddings_list[0])  # Vector of 384 dimensions

Add sparse or reranker models

For hybrid search, pull in SPLADE++ sparse vectors or a cross-encoder reranker. Each uses its own class.

pythonpython
from fastembed import SparseTextEmbedding
from fastembed.rerank.cross_encoder import TextCrossEncoder

sparse = SparseTextEmbedding(model_name="prithivida/Splade_PP_en_v1")
sparse_embeddings = list(sparse.embed(documents))

encoder = TextCrossEncoder(model_name="Xenova/ms-marco-MiniLM-L-6-v2")
scores = list(encoder.rerank("Who is maintaining Qdrant?", documents))

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Generating dense vectors for documents and queries to power semantic search in a vector store
  • Building hybrid search that combines dense vectors with SPLADE++ sparse vectors
  • Re-scoring top retrieval results with a cross-encoder reranker to improve ranking quality
  • Running embeddings in size-constrained or serverless environments without GPUs or large PyTorch installs

How FastEmbed compares

FastEmbed alongside other open-source rerank, search & hybrid tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Elasticsearch★ 77.1kDistributed search and analytics engine with a built-in vector database for dense/sparse embeddings and hybrid keyword-plus-semantic retrieval.
Meilisearch Cloud★ 58.2kManaged cloud for the Meilisearch engine, combining fast full-text search with hybrid, semantic, and multimodal vector search.
Typesense Cloud★ 26.1kManaged hosting for the Typesense search engine, offering typo-tolerant keyword search plus vector and semantic search via a simple API.
Tantivy★ 15.4kA fast full-text search engine library in Rust that provides BM25 keyword search for the lexical half of hybrid retrieval.
FlagEmbedding★ 11.8kBAAI's retrieval toolkit that provides the BGE embedding and cross-encoder reranker models used widely in RAG pipelines.
Vespa★ 7kA search and serving engine that natively combines vector, keyword (BM25), and structured search with built-in ranking for large-scale retrieval.
RAGatouille★ 3.9kA wrapper that makes it easy to train and use ColBERT late-interaction retrieval inside RAG pipelines.
FastEmbed★ 3kLightweight Python library for dense, sparse, and late-interaction embeddings