Overview
Vespa is an open-source engine for storing, searching, and ranking large amounts of data at serving time. It handles vectors, tensors, text, and structured fields in one place, so you can run vector search, keyword (BM25) search, and filters on structured data within a single query.
It is built for teams who need low-latency retrieval over data that keeps changing, often returning results in under 100 milliseconds while the corpus is updated. Vespa distributes data across multiple nodes and evaluates queries in parallel, which is why it is used on internet services that handle hundreds of thousands of queries per second.
In the RAG and retrieval space, Vespa fits the hybrid-search and reranking role: instead of bolting a vector index onto a separate keyword index, it does both natively and lets you express custom ranking, including machine-learned models, as part of the application config.
What it does
- Combines vector, tensor, keyword (BM25), and structured search in a single query
- Built-in ranking framework, including evaluation of machine-learned models at serving time
- Designed for low-latency retrieval, often under 100 ms, over continuously changing data
- Scales horizontally across multiple nodes with high availability and parallel query evaluation
- Vespa CLI plus a JSON document/query API for feeding data and running searches
- Run locally with Docker or deploy to the managed Vespa Cloud service
Getting started
You do not need to build Vespa from source to use it. The quickest path is the Vespa CLI with a local Docker container running a sample application.
Install the Vespa CLI
Install the CLI with Homebrew, or download a build from the GitHub releases page if Homebrew is not available.
brew install vespa-cliStart Vespa locally with Docker
Point the CLI at a local target, then run the Vespa container, publishing the query (8080) and config (19071) ports.
vespa config set target local
docker run --detach --name vespa --hostname vespa-container \
--publish 8080:8080 --publish 19071:19071 \
vespaengine/vespaDeploy a sample app and feed data
Clone the album-recommendation sample application, deploy it, and feed the included documents.
vespa clone album-recommendation myapp && cd myapp
vespa deploy --wait 300 ./app
vespa feed dataset/documents.jsonlRun a query
Query the indexed documents with the CLI using Vespa's SQL-like query syntax.
vespa query "select * from music where album contains 'head'"Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Hybrid search that blends dense vector similarity with BM25 keyword matching and structured filters in one query
- Retrieval and reranking backend for a RAG pipeline over a large, frequently updated corpus
- Recommendation and personalization systems that evaluate machine-learned ranking models at serving time
- Low-latency search over hundreds of thousands of queries per second across a distributed, multi-node cluster
How Vespa compares
Vespa alongside other open-source rerank, search & hybrid tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Elasticsearch | ★ 77.1k | Distributed search and analytics engine with a built-in vector database for dense/sparse embeddings and hybrid keyword-plus-semantic retrieval. |
| Meilisearch Cloud | ★ 58.2k | Managed cloud for the Meilisearch engine, combining fast full-text search with hybrid, semantic, and multimodal vector search. |
| Typesense Cloud | ★ 26.1k | Managed hosting for the Typesense search engine, offering typo-tolerant keyword search plus vector and semantic search via a simple API. |
| Tantivy | ★ 15.4k | A fast full-text search engine library in Rust that provides BM25 keyword search for the lexical half of hybrid retrieval. |
| FlagEmbedding | ★ 11.8k | BAAI's retrieval toolkit that provides the BGE embedding and cross-encoder reranker models used widely in RAG pipelines. |
| Vespa | ★ 7k | One engine for vector, keyword, and structured search with built-in ranking at scale |
| RAGatouille | ★ 3.9k | A wrapper that makes it easy to train and use ColBERT late-interaction retrieval inside RAG pipelines. |
| ColBERT | ★ 3.9k | The reference implementation of ColBERT late-interaction retrieval, which ranks passages using token-level vector matching. |