Overview
Tantivy is a full-text search engine library written in Rust. It is closer to Apache Lucene than to a ready-made server like Elasticsearch or Solr: instead of running it as a standalone service, you add it as a crate and use it to build your own search engine. Its design is strongly inspired by Lucene.
It is aimed at Rust developers who need keyword search inside their own application or tool. You define a schema, index documents, and run queries directly from your code. A startup time under 10ms makes it a good fit for command-line tools, and multithreaded indexing handles large collections quickly.
In a RAG or hybrid-retrieval setup, Tantivy covers the lexical half of the pipeline. Its BM25 scoring (the same algorithm as Lucene) provides the keyword-matching signal that you can combine with vector search to rank results.
What it does
- BM25 scoring, the same ranking algorithm used by Lucene
- Configurable tokenizer with stemming for 17 Latin languages, plus third-party support for Chinese, Japanese, and Korean
- Natural query language with boolean and phrase queries, e.g. (michael AND jackson) OR "king of pop"
- Incremental and multithreaded indexing, with a memory-mapped (mmap) directory
- Many field types: text, i64, u64, f64, dates, IP, bool, hierarchical facets, and JSON fields
- Faceted search, range queries, and an aggregation collector for histograms, range buckets, and stats
Getting started
Tantivy works on stable Rust and supports Linux, macOS, and Windows. Add the crate to your project, then define a schema, index documents, and query them.
Add the crate
Add Tantivy to your Cargo.toml dependencies.
[dependencies]
tantivy = "0.26.1"Define a schema and index a document
Build a schema, create an index on disk, then add documents and commit so they become searchable.
use tantivy::schema::*;
use tantivy::{doc, Index};
let mut schema_builder = Schema::builder();
let title = schema_builder.add_text_field("title", TEXT | STORED);
let body = schema_builder.add_text_field("body", TEXT);
let schema = schema_builder.build();
let index = Index::create_in_dir("./index", schema.clone())?;
let mut writer = index.writer(100_000_000)?;
writer.add_document(doc!(
title => "The Old Man and the Sea",
body => "He was an old man who fished alone..."
))?;
writer.commit()?;Run a search
Acquire a reader and searcher, parse a query with the QueryParser, and collect the top results. Documents are only visible after a commit.
use tantivy::collector::TopDocs;
use tantivy::query::QueryParser;
let reader = index.reader()?;
let searcher = reader.searcher();
let query_parser = QueryParser::for_index(&index, vec![title, body]);
let query = query_parser.parse_query("sea")?;
let results = searcher.search(&query, &TopDocs::with_limit(10))?;Try it from other languages or the CLI
Bindings exist for Python (tantivy-py) and Ruby (tantiny), and tantivy-cli lets you build and query an index from the command line.
Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Add full-text keyword search to a Rust application without running a separate search server
- Supply the lexical (BM25) half of a hybrid retrieval pipeline that also uses vector search
- Build fast command-line search tools that benefit from sub-10ms startup
- Index and search large document collections, such as logs, emails, or a Wikipedia-scale corpus, with faceted and range queries
How Tantivy compares
Tantivy alongside other open-source rerank, search & hybrid tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Elasticsearch | ★ 77.1k | Distributed search and analytics engine with a built-in vector database for dense/sparse embeddings and hybrid keyword-plus-semantic retrieval. |
| Meilisearch Cloud | ★ 58.2k | Managed cloud for the Meilisearch engine, combining fast full-text search with hybrid, semantic, and multimodal vector search. |
| Typesense Cloud | ★ 26.1k | Managed hosting for the Typesense search engine, offering typo-tolerant keyword search plus vector and semantic search via a simple API. |
| Tantivy | ★ 15.4k | Fast full-text search engine library in Rust with BM25 scoring |
| FlagEmbedding | ★ 11.8k | BAAI's retrieval toolkit that provides the BGE embedding and cross-encoder reranker models used widely in RAG pipelines. |
| Vespa | ★ 7k | A search and serving engine that natively combines vector, keyword (BM25), and structured search with built-in ranking for large-scale retrieval. |
| RAGatouille | ★ 3.9k | A wrapper that makes it easy to train and use ColBERT late-interaction retrieval inside RAG pipelines. |
| ColBERT | ★ 3.9k | The reference implementation of ColBERT late-interaction retrieval, which ranks passages using token-level vector matching. |