AI/TLDR

FlagEmbedding

One-stop retrieval toolkit with the BGE embedding and reranker models for search and RAG

Overview

FlagEmbedding is an open-source retrieval toolkit from BAAI (Beijing Academy of Artificial Intelligence). It packages the widely used BGE family of models, including dense text embedding models and cross-encoder rerankers, behind a single Python library so you can turn text into vectors and score query-document pairs.

It is aimed at developers building search and RAG (retrieval-augmented generation) pipelines. You load a model from Hugging Face by name, call encode to get embeddings, and compute similarity to find relevant passages. The library also covers the multilingual BGE-M3 model, which supports dense, lexical, and multi-vector retrieval in one model.

Within the rerankers and hybrid-search category, FlagEmbedding gives you both halves of a retrieval stack: embedding models for the first-stage recall and reranker models to reorder the top candidates before they reach your LLM.

What it does

  • BGE embedding models for dense text retrieval, loaded by name from Hugging Face
  • Cross-encoder reranker models (e.g. bge-reranker-v2-m3) for reordering retrieved passages
  • BGE-M3 supports 100+ languages, inputs up to 8192 tokens, and dense, lexical, and multi-vector retrieval in one model
  • Simple Python API: from_finetuned to load a model, encode to embed, then matrix-multiply for similarity
  • Optional fp16 inference (use_fp16=True) to reduce memory and speed up encoding
  • Optional finetuning extras installable via the FlagEmbedding[finetune] package

Getting started

Install the library from PyPI, then load a BGE model and encode a couple of sentences to measure their similarity.

Install FlagEmbedding

Install the package from PyPI. Add the finetune extra only if you plan to train models.

bashbash
pip install -U FlagEmbedding

Load an embedding model

Load a BGE embedding model by its Hugging Face name. The retrieval instruction helps the model encode search queries.

pythonpython
from FlagEmbedding import FlagAutoModel

model = FlagAutoModel.from_finetuned('BAAI/bge-base-en-v1.5',
                                      query_instruction_for_retrieval="Represent this sentence for searching relevant passages:",
                                      use_fp16=True)

Encode text and compare

Encode two lists of sentences, then take the dot product of the embeddings to get a similarity matrix.

pythonpython
sentences_1 = ["I love NLP", "I love machine learning"]
sentences_2 = ["I love BGE", "I love text retrieval"]
embeddings_1 = model.encode(sentences_1)
embeddings_2 = model.encode(sentences_2)

similarity = embeddings_1 @ embeddings_2.T
print(similarity)

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Generate text embeddings for the retrieval step of a RAG pipeline
  • Rerank the top candidates from a vector search before passing them to an LLM
  • Build multilingual search using BGE-M3 across 100+ languages
  • Run semantic similarity or deduplication over a collection of documents

How FlagEmbedding compares

FlagEmbedding alongside other open-source rerank, search & hybrid tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Elasticsearch★ 77.1kDistributed search and analytics engine with a built-in vector database for dense/sparse embeddings and hybrid keyword-plus-semantic retrieval.
Meilisearch Cloud★ 58.2kManaged cloud for the Meilisearch engine, combining fast full-text search with hybrid, semantic, and multimodal vector search.
Typesense Cloud★ 26.1kManaged hosting for the Typesense search engine, offering typo-tolerant keyword search plus vector and semantic search via a simple API.
Tantivy★ 15.4kA fast full-text search engine library in Rust that provides BM25 keyword search for the lexical half of hybrid retrieval.
FlagEmbedding★ 11.8kOne-stop retrieval toolkit with the BGE embedding and reranker models for search and RAG
Vespa★ 7kA search and serving engine that natively combines vector, keyword (BM25), and structured search with built-in ranking for large-scale retrieval.
RAGatouille★ 3.9kA wrapper that makes it easy to train and use ColBERT late-interaction retrieval inside RAG pipelines.
ColBERT★ 3.9kThe reference implementation of ColBERT late-interaction retrieval, which ranks passages using token-level vector matching.