Overview
FlagEmbedding is an open-source retrieval toolkit from BAAI (Beijing Academy of Artificial Intelligence). It packages the widely used BGE family of models, including dense text embedding models and cross-encoder rerankers, behind a single Python library so you can turn text into vectors and score query-document pairs.
It is aimed at developers building search and RAG (retrieval-augmented generation) pipelines. You load a model from Hugging Face by name, call encode to get embeddings, and compute similarity to find relevant passages. The library also covers the multilingual BGE-M3 model, which supports dense, lexical, and multi-vector retrieval in one model.
Within the rerankers and hybrid-search category, FlagEmbedding gives you both halves of a retrieval stack: embedding models for the first-stage recall and reranker models to reorder the top candidates before they reach your LLM.
What it does
- BGE embedding models for dense text retrieval, loaded by name from Hugging Face
- Cross-encoder reranker models (e.g. bge-reranker-v2-m3) for reordering retrieved passages
- BGE-M3 supports 100+ languages, inputs up to 8192 tokens, and dense, lexical, and multi-vector retrieval in one model
- Simple Python API: from_finetuned to load a model, encode to embed, then matrix-multiply for similarity
- Optional fp16 inference (use_fp16=True) to reduce memory and speed up encoding
- Optional finetuning extras installable via the FlagEmbedding[finetune] package
Getting started
Install the library from PyPI, then load a BGE model and encode a couple of sentences to measure their similarity.
Install FlagEmbedding
Install the package from PyPI. Add the finetune extra only if you plan to train models.
pip install -U FlagEmbeddingLoad an embedding model
Load a BGE embedding model by its Hugging Face name. The retrieval instruction helps the model encode search queries.
from FlagEmbedding import FlagAutoModel
model = FlagAutoModel.from_finetuned('BAAI/bge-base-en-v1.5',
query_instruction_for_retrieval="Represent this sentence for searching relevant passages:",
use_fp16=True)Encode text and compare
Encode two lists of sentences, then take the dot product of the embeddings to get a similarity matrix.
sentences_1 = ["I love NLP", "I love machine learning"]
sentences_2 = ["I love BGE", "I love text retrieval"]
embeddings_1 = model.encode(sentences_1)
embeddings_2 = model.encode(sentences_2)
similarity = embeddings_1 @ embeddings_2.T
print(similarity)Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Generate text embeddings for the retrieval step of a RAG pipeline
- Rerank the top candidates from a vector search before passing them to an LLM
- Build multilingual search using BGE-M3 across 100+ languages
- Run semantic similarity or deduplication over a collection of documents
How FlagEmbedding compares
FlagEmbedding alongside other open-source rerank, search & hybrid tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Elasticsearch | ★ 77.1k | Distributed search and analytics engine with a built-in vector database for dense/sparse embeddings and hybrid keyword-plus-semantic retrieval. |
| Meilisearch Cloud | ★ 58.2k | Managed cloud for the Meilisearch engine, combining fast full-text search with hybrid, semantic, and multimodal vector search. |
| Typesense Cloud | ★ 26.1k | Managed hosting for the Typesense search engine, offering typo-tolerant keyword search plus vector and semantic search via a simple API. |
| Tantivy | ★ 15.4k | A fast full-text search engine library in Rust that provides BM25 keyword search for the lexical half of hybrid retrieval. |
| FlagEmbedding | ★ 11.8k | One-stop retrieval toolkit with the BGE embedding and reranker models for search and RAG |
| Vespa | ★ 7k | A search and serving engine that natively combines vector, keyword (BM25), and structured search with built-in ranking for large-scale retrieval. |
| RAGatouille | ★ 3.9k | A wrapper that makes it easy to train and use ColBERT late-interaction retrieval inside RAG pipelines. |
| ColBERT | ★ 3.9k | The reference implementation of ColBERT late-interaction retrieval, which ranks passages using token-level vector matching. |