Vespa

One engine for vector, keyword, and structured search with built-in ranking at scale

github.com/vespa-engine/vespa★ 7k vespa.ai

Overview

Vespa is an open-source engine for storing, searching, and ranking large amounts of data at serving time. It handles vectors, tensors, text, and structured fields in one place, so you can run vector search, keyword (BM25) search, and filters on structured data within a single query.

It is built for teams who need low-latency retrieval over data that keeps changing, often returning results in under 100 milliseconds while the corpus is updated. Vespa distributes data across multiple nodes and evaluates queries in parallel, which is why it is used on internet services that handle hundreds of thousands of queries per second.

In the RAG and retrieval space, Vespa fits the hybrid-search and reranking role: instead of bolting a vector index onto a separate keyword index, it does both natively and lets you express custom ranking, including machine-learned models, as part of the application config.

What it does

Combines vector, tensor, keyword (BM25), and structured search in a single query
Built-in ranking framework, including evaluation of machine-learned models at serving time
Designed for low-latency retrieval, often under 100 ms, over continuously changing data
Scales horizontally across multiple nodes with high availability and parallel query evaluation
Vespa CLI plus a JSON document/query API for feeding data and running searches
Run locally with Docker or deploy to the managed Vespa Cloud service

Getting started

You do not need to build Vespa from source to use it. The quickest path is the Vespa CLI with a local Docker container running a sample application.

Install the Vespa CLI

Install the CLI with Homebrew, or download a build from the GitHub releases page if Homebrew is not available.

bashbash

brew install vespa-cli

Start Vespa locally with Docker

Point the CLI at a local target, then run the Vespa container, publishing the query (8080) and config (19071) ports.

bashbash

vespa config set target local
docker run --detach --name vespa --hostname vespa-container \
  --publish 8080:8080 --publish 19071:19071 \
  vespaengine/vespa

Deploy a sample app and feed data

Clone the album-recommendation sample application, deploy it, and feed the included documents.

bashbash

vespa clone album-recommendation myapp && cd myapp
vespa deploy --wait 300 ./app
vespa feed dataset/documents.jsonl

Run a query

Query the indexed documents with the CLI using Vespa's SQL-like query syntax.

bashbash

vespa query "select * from music where album contains 'head'"

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Hybrid search that blends dense vector similarity with BM25 keyword matching and structured filters in one query
Retrieval and reranking backend for a RAG pipeline over a large, frequently updated corpus
Recommendation and personalization systems that evaluate machine-learned ranking models at serving time
Low-latency search over hundreds of thousands of queries per second across a distributed, multi-node cluster

How Vespa compares

Vespa alongside other open-source rerank, search & hybrid tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Elasticsearch	★ 77.1k	Distributed search and analytics engine with a built-in vector database for dense/sparse embeddings and hybrid keyword-plus-semantic retrieval.
Meilisearch Cloud	★ 58.2k	Managed cloud for the Meilisearch engine, combining fast full-text search with hybrid, semantic, and multimodal vector search.
Typesense Cloud	★ 26.1k	Managed hosting for the Typesense search engine, offering typo-tolerant keyword search plus vector and semantic search via a simple API.
Tantivy	★ 15.4k	A fast full-text search engine library in Rust that provides BM25 keyword search for the lexical half of hybrid retrieval.
FlagEmbedding	★ 11.8k	BAAI's retrieval toolkit that provides the BGE embedding and cross-encoder reranker models used widely in RAG pipelines.
Vespa	★ 7k	One engine for vector, keyword, and structured search with built-in ranking at scale
RAGatouille	★ 3.9k	A wrapper that makes it easy to train and use ColBERT late-interaction retrieval inside RAG pipelines.
ColBERT	★ 3.9k	The reference implementation of ColBERT late-interaction retrieval, which ranks passages using token-level vector matching.

// Overview

// What it does

// Getting started

Install the Vespa CLI

Start Vespa locally with Docker

Deploy a sample app and feed data

Run a query

// When to use it

// How Vespa compares

Overview

What it does

Getting started

When to use it

How Vespa compares