AI/TLDR

Tantivy

Fast full-text search engine library in Rust with BM25 scoring

Overview

Tantivy is a full-text search engine library written in Rust. It is closer to Apache Lucene than to a ready-made server like Elasticsearch or Solr: instead of running it as a standalone service, you add it as a crate and use it to build your own search engine. Its design is strongly inspired by Lucene.

It is aimed at Rust developers who need keyword search inside their own application or tool. You define a schema, index documents, and run queries directly from your code. A startup time under 10ms makes it a good fit for command-line tools, and multithreaded indexing handles large collections quickly.

In a RAG or hybrid-retrieval setup, Tantivy covers the lexical half of the pipeline. Its BM25 scoring (the same algorithm as Lucene) provides the keyword-matching signal that you can combine with vector search to rank results.

What it does

  • BM25 scoring, the same ranking algorithm used by Lucene
  • Configurable tokenizer with stemming for 17 Latin languages, plus third-party support for Chinese, Japanese, and Korean
  • Natural query language with boolean and phrase queries, e.g. (michael AND jackson) OR "king of pop"
  • Incremental and multithreaded indexing, with a memory-mapped (mmap) directory
  • Many field types: text, i64, u64, f64, dates, IP, bool, hierarchical facets, and JSON fields
  • Faceted search, range queries, and an aggregation collector for histograms, range buckets, and stats

Getting started

Tantivy works on stable Rust and supports Linux, macOS, and Windows. Add the crate to your project, then define a schema, index documents, and query them.

Add the crate

Add Tantivy to your Cargo.toml dependencies.

tomltoml
[dependencies]
tantivy = "0.26.1"

Define a schema and index a document

Build a schema, create an index on disk, then add documents and commit so they become searchable.

rustrust
use tantivy::schema::*;
use tantivy::{doc, Index};

let mut schema_builder = Schema::builder();
let title = schema_builder.add_text_field("title", TEXT | STORED);
let body = schema_builder.add_text_field("body", TEXT);
let schema = schema_builder.build();

let index = Index::create_in_dir("./index", schema.clone())?;
let mut writer = index.writer(100_000_000)?;
writer.add_document(doc!(
    title => "The Old Man and the Sea",
    body  => "He was an old man who fished alone..."
))?;
writer.commit()?;

Run a search

Acquire a reader and searcher, parse a query with the QueryParser, and collect the top results. Documents are only visible after a commit.

rustrust
use tantivy::collector::TopDocs;
use tantivy::query::QueryParser;

let reader = index.reader()?;
let searcher = reader.searcher();
let query_parser = QueryParser::for_index(&index, vec![title, body]);
let query = query_parser.parse_query("sea")?;
let results = searcher.search(&query, &TopDocs::with_limit(10))?;

Try it from other languages or the CLI

Bindings exist for Python (tantivy-py) and Ruby (tantiny), and tantivy-cli lets you build and query an index from the command line.

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Add full-text keyword search to a Rust application without running a separate search server
  • Supply the lexical (BM25) half of a hybrid retrieval pipeline that also uses vector search
  • Build fast command-line search tools that benefit from sub-10ms startup
  • Index and search large document collections, such as logs, emails, or a Wikipedia-scale corpus, with faceted and range queries

How Tantivy compares

Tantivy alongside other open-source rerank, search & hybrid tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Elasticsearch★ 77.1kDistributed search and analytics engine with a built-in vector database for dense/sparse embeddings and hybrid keyword-plus-semantic retrieval.
Meilisearch Cloud★ 58.2kManaged cloud for the Meilisearch engine, combining fast full-text search with hybrid, semantic, and multimodal vector search.
Typesense Cloud★ 26.1kManaged hosting for the Typesense search engine, offering typo-tolerant keyword search plus vector and semantic search via a simple API.
Tantivy★ 15.4kFast full-text search engine library in Rust with BM25 scoring
FlagEmbedding★ 11.8kBAAI's retrieval toolkit that provides the BGE embedding and cross-encoder reranker models used widely in RAG pipelines.
Vespa★ 7kA search and serving engine that natively combines vector, keyword (BM25), and structured search with built-in ranking for large-scale retrieval.
RAGatouille★ 3.9kA wrapper that makes it easy to train and use ColBERT late-interaction retrieval inside RAG pipelines.
ColBERT★ 3.9kThe reference implementation of ColBERT late-interaction retrieval, which ranks passages using token-level vector matching.