FlagEmbedding

One-stop retrieval toolkit with the BGE embedding and reranker models for search and RAG

github.com/FlagOpen/FlagEmbedding★ 11.8k bge-model.com

Overview

FlagEmbedding is an open-source retrieval toolkit from BAAI (Beijing Academy of Artificial Intelligence). It packages the widely used BGE family of models, including dense text embedding models and cross-encoder rerankers, behind a single Python library so you can turn text into vectors and score query-document pairs.

It is aimed at developers building search and RAG (retrieval-augmented generation) pipelines. You load a model from Hugging Face by name, call encode to get embeddings, and compute similarity to find relevant passages. The library also covers the multilingual BGE-M3 model, which supports dense, lexical, and multi-vector retrieval in one model.

Within the rerankers and hybrid-search category, FlagEmbedding gives you both halves of a retrieval stack: embedding models for the first-stage recall and reranker models to reorder the top candidates before they reach your LLM.

What it does

BGE embedding models for dense text retrieval, loaded by name from Hugging Face
Cross-encoder reranker models (e.g. bge-reranker-v2-m3) for reordering retrieved passages
BGE-M3 supports 100+ languages, inputs up to 8192 tokens, and dense, lexical, and multi-vector retrieval in one model
Simple Python API: from_finetuned to load a model, encode to embed, then matrix-multiply for similarity
Optional fp16 inference (use_fp16=True) to reduce memory and speed up encoding
Optional finetuning extras installable via the FlagEmbedding[finetune] package

Getting started

Install the library from PyPI, then load a BGE model and encode a couple of sentences to measure their similarity.

Install FlagEmbedding

Install the package from PyPI. Add the finetune extra only if you plan to train models.

bashbash

pip install -U FlagEmbedding

Load an embedding model

Load a BGE embedding model by its Hugging Face name. The retrieval instruction helps the model encode search queries.

pythonpython

from FlagEmbedding import FlagAutoModel

model = FlagAutoModel.from_finetuned('BAAI/bge-base-en-v1.5',
                                      query_instruction_for_retrieval="Represent this sentence for searching relevant passages:",
                                      use_fp16=True)

Encode text and compare

Encode two lists of sentences, then take the dot product of the embeddings to get a similarity matrix.

pythonpython

sentences_1 = ["I love NLP", "I love machine learning"]
sentences_2 = ["I love BGE", "I love text retrieval"]
embeddings_1 = model.encode(sentences_1)
embeddings_2 = model.encode(sentences_2)

similarity = embeddings_1 @ embeddings_2.T
print(similarity)

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Generate text embeddings for the retrieval step of a RAG pipeline
Rerank the top candidates from a vector search before passing them to an LLM
Build multilingual search using BGE-M3 across 100+ languages
Run semantic similarity or deduplication over a collection of documents

How FlagEmbedding compares

FlagEmbedding alongside other open-source rerank, search & hybrid tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Elasticsearch	★ 77.1k	Distributed search and analytics engine with a built-in vector database for dense/sparse embeddings and hybrid keyword-plus-semantic retrieval.
Meilisearch Cloud	★ 58.2k	Managed cloud for the Meilisearch engine, combining fast full-text search with hybrid, semantic, and multimodal vector search.
Typesense Cloud	★ 26.1k	Managed hosting for the Typesense search engine, offering typo-tolerant keyword search plus vector and semantic search via a simple API.
Tantivy	★ 15.4k	A fast full-text search engine library in Rust that provides BM25 keyword search for the lexical half of hybrid retrieval.
FlagEmbedding	★ 11.8k	One-stop retrieval toolkit with the BGE embedding and reranker models for search and RAG
Vespa	★ 7k	A search and serving engine that natively combines vector, keyword (BM25), and structured search with built-in ranking for large-scale retrieval.
RAGatouille	★ 3.9k	A wrapper that makes it easy to train and use ColBERT late-interaction retrieval inside RAG pipelines.
ColBERT	★ 3.9k	The reference implementation of ColBERT late-interaction retrieval, which ranks passages using token-level vector matching.

// Overview

// What it does

// Getting started