Tantivy

Fast full-text search engine library in Rust with BM25 scoring

Overview

Tantivy is a full-text search engine library written in Rust. It is closer to Apache Lucene than to a ready-made server like Elasticsearch or Solr: instead of running it as a standalone service, you add it as a crate and use it to build your own search engine. Its design is strongly inspired by Lucene.

It is aimed at Rust developers who need keyword search inside their own application or tool. You define a schema, index documents, and run queries directly from your code. A startup time under 10ms makes it a good fit for command-line tools, and multithreaded indexing handles large collections quickly.

In a RAG or hybrid-retrieval setup, Tantivy covers the lexical half of the pipeline. Its BM25 scoring (the same algorithm as Lucene) provides the keyword-matching signal that you can combine with vector search to rank results.

What it does

BM25 scoring, the same ranking algorithm used by Lucene
Configurable tokenizer with stemming for 17 Latin languages, plus third-party support for Chinese, Japanese, and Korean
Natural query language with boolean and phrase queries, e.g. (michael AND jackson) OR "king of pop"
Incremental and multithreaded indexing, with a memory-mapped (mmap) directory
Many field types: text, i64, u64, f64, dates, IP, bool, hierarchical facets, and JSON fields
Faceted search, range queries, and an aggregation collector for histograms, range buckets, and stats

Getting started

Tantivy works on stable Rust and supports Linux, macOS, and Windows. Add the crate to your project, then define a schema, index documents, and query them.

Add the crate

Add Tantivy to your Cargo.toml dependencies.

tomltoml

[dependencies]
tantivy = "0.26.1"

Define a schema and index a document

Build a schema, create an index on disk, then add documents and commit so they become searchable.

rustrust

use tantivy::schema::*;
use tantivy::{doc, Index};

let mut schema_builder = Schema::builder();
let title = schema_builder.add_text_field("title", TEXT | STORED);
let body = schema_builder.add_text_field("body", TEXT);
let schema = schema_builder.build();

let index = Index::create_in_dir("./index", schema.clone())?;
let mut writer = index.writer(100_000_000)?;
writer.add_document(doc!(
    title => "The Old Man and the Sea",
    body  => "He was an old man who fished alone..."
))?;
writer.commit()?;

Run a search

Acquire a reader and searcher, parse a query with the QueryParser, and collect the top results. Documents are only visible after a commit.

rustrust

use tantivy::collector::TopDocs;
use tantivy::query::QueryParser;

let reader = index.reader()?;
let searcher = reader.searcher();
let query_parser = QueryParser::for_index(&index, vec![title, body]);
let query = query_parser.parse_query("sea")?;
let results = searcher.search(&query, &TopDocs::with_limit(10))?;

Try it from other languages or the CLI

Bindings exist for Python (tantivy-py) and Ruby (tantiny), and tantivy-cli lets you build and query an index from the command line.

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Add full-text keyword search to a Rust application without running a separate search server
Supply the lexical (BM25) half of a hybrid retrieval pipeline that also uses vector search
Build fast command-line search tools that benefit from sub-10ms startup
Index and search large document collections, such as logs, emails, or a Wikipedia-scale corpus, with faceted and range queries

How Tantivy compares

Tantivy alongside other open-source rerank, search & hybrid tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Elasticsearch	★ 77.1k	Distributed search and analytics engine with a built-in vector database for dense/sparse embeddings and hybrid keyword-plus-semantic retrieval.
Meilisearch Cloud	★ 58.2k	Managed cloud for the Meilisearch engine, combining fast full-text search with hybrid, semantic, and multimodal vector search.
Typesense Cloud	★ 26.1k	Managed hosting for the Typesense search engine, offering typo-tolerant keyword search plus vector and semantic search via a simple API.
Tantivy	★ 15.4k	Fast full-text search engine library in Rust with BM25 scoring
FlagEmbedding	★ 11.8k	BAAI's retrieval toolkit that provides the BGE embedding and cross-encoder reranker models used widely in RAG pipelines.
Vespa	★ 7k	A search and serving engine that natively combines vector, keyword (BM25), and structured search with built-in ranking for large-scale retrieval.
RAGatouille	★ 3.9k	A wrapper that makes it easy to train and use ColBERT late-interaction retrieval inside RAG pipelines.
ColBERT	★ 3.9k	The reference implementation of ColBERT late-interaction retrieval, which ranks passages using token-level vector matching.

// Overview

// What it does

// Getting started