AI/TLDR

Vespa

One engine for vector, keyword, and structured search with built-in ranking at scale

Overview

Vespa is an open-source engine for storing, searching, and ranking large amounts of data at serving time. It handles vectors, tensors, text, and structured fields in one place, so you can run vector search, keyword (BM25) search, and filters on structured data within a single query.

It is built for teams who need low-latency retrieval over data that keeps changing, often returning results in under 100 milliseconds while the corpus is updated. Vespa distributes data across multiple nodes and evaluates queries in parallel, which is why it is used on internet services that handle hundreds of thousands of queries per second.

In the RAG and retrieval space, Vespa fits the hybrid-search and reranking role: instead of bolting a vector index onto a separate keyword index, it does both natively and lets you express custom ranking, including machine-learned models, as part of the application config.

What it does

  • Combines vector, tensor, keyword (BM25), and structured search in a single query
  • Built-in ranking framework, including evaluation of machine-learned models at serving time
  • Designed for low-latency retrieval, often under 100 ms, over continuously changing data
  • Scales horizontally across multiple nodes with high availability and parallel query evaluation
  • Vespa CLI plus a JSON document/query API for feeding data and running searches
  • Run locally with Docker or deploy to the managed Vespa Cloud service

Getting started

You do not need to build Vespa from source to use it. The quickest path is the Vespa CLI with a local Docker container running a sample application.

Install the Vespa CLI

Install the CLI with Homebrew, or download a build from the GitHub releases page if Homebrew is not available.

bashbash
brew install vespa-cli

Start Vespa locally with Docker

Point the CLI at a local target, then run the Vespa container, publishing the query (8080) and config (19071) ports.

bashbash
vespa config set target local
docker run --detach --name vespa --hostname vespa-container \
  --publish 8080:8080 --publish 19071:19071 \
  vespaengine/vespa

Deploy a sample app and feed data

Clone the album-recommendation sample application, deploy it, and feed the included documents.

bashbash
vespa clone album-recommendation myapp && cd myapp
vespa deploy --wait 300 ./app
vespa feed dataset/documents.jsonl

Run a query

Query the indexed documents with the CLI using Vespa's SQL-like query syntax.

bashbash
vespa query "select * from music where album contains 'head'"

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Hybrid search that blends dense vector similarity with BM25 keyword matching and structured filters in one query
  • Retrieval and reranking backend for a RAG pipeline over a large, frequently updated corpus
  • Recommendation and personalization systems that evaluate machine-learned ranking models at serving time
  • Low-latency search over hundreds of thousands of queries per second across a distributed, multi-node cluster

How Vespa compares

Vespa alongside other open-source rerank, search & hybrid tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Elasticsearch★ 77.1kDistributed search and analytics engine with a built-in vector database for dense/sparse embeddings and hybrid keyword-plus-semantic retrieval.
Meilisearch Cloud★ 58.2kManaged cloud for the Meilisearch engine, combining fast full-text search with hybrid, semantic, and multimodal vector search.
Typesense Cloud★ 26.1kManaged hosting for the Typesense search engine, offering typo-tolerant keyword search plus vector and semantic search via a simple API.
Tantivy★ 15.4kA fast full-text search engine library in Rust that provides BM25 keyword search for the lexical half of hybrid retrieval.
FlagEmbedding★ 11.8kBAAI's retrieval toolkit that provides the BGE embedding and cross-encoder reranker models used widely in RAG pipelines.
Vespa★ 7kOne engine for vector, keyword, and structured search with built-in ranking at scale
RAGatouille★ 3.9kA wrapper that makes it easy to train and use ColBERT late-interaction retrieval inside RAG pipelines.
ColBERT★ 3.9kThe reference implementation of ColBERT late-interaction retrieval, which ranks passages using token-level vector matching.