What Is an IVF Index? Clustering Vectors for Faster Search

Q: What do nprobe and nlist mean in IVF?

`nlist` is the number of clusters (buckets) you create when training the index — set once at build time. `nprobe` is how many of those buckets each search opens — set per query. Raising `nprobe` improves recall but slows search, and you can change it on every query without rebuilding the index.

Q: Why might IVF miss the nearest neighbor?

IVF only scans the `nprobe` clusters closest to the query, so if the true nearest neighbor sits in a bucket you didn't probe — common when a query lands near a cluster boundary — it is missed. Raising `nprobe` opens more buckets and recovers more of these boundary cases at the cost of slower search.

Learn how an IVF index partitions vectors into clusters so search only scans the few buckets near your query, and how to tune nlist and nprobe.

INTERMEDIATE11 MIN READUPDATED 2026-06-13

In plain English

A vector database stores your data as embeddings — long lists of numbers where similar items sit close together. To answer "find the items most like this query," the simplest method compares the query against every stored vector. That's exact, but with ten million vectors it's painfully slow. An IVF index is one of the classic tricks for skipping most of that work.

IVF Index — illustration — IVF Index — assets.zilliz.com

IVF stands for inverted file index. The idea is to sort your vectors into buckets ahead of time, then at search time only open the few buckets nearest your query. You trade a tiny chance of missing the true best match for a search that's often 10–100x faster.

Think of a giant grocery warehouse. Without organization, finding peanut butter means walking past every single product. Instead the warehouse groups items into aisles — dairy, bakery, snacks. When you want peanut butter you walk straight to the snacks aisle and ignore the rest of the building. IVF does exactly this for vectors: it carves the space into clusters (the aisles), figures out which clusters your query belongs near, and searches only inside those.

Why it matters

The core problem IVF solves is the cost of brute-force search. Comparing a query to every vector is called a flat or exact search. It always returns the true nearest neighbors, but its cost grows linearly with the number of vectors — double your data, double your search time. At a few thousand vectors nobody notices. At tens of millions, every query becomes too slow and too expensive to serve in real time.

IVF belongs to the family of approximate nearest neighbor (ANN) methods. "Approximate" is the key word: instead of guaranteeing the perfect top results, it returns almost always the right results while scanning a small slice of your data. For semantic search, recommendations, and RAG retrieval, that tradeoff is almost always worth it — a 99% chance of the correct neighbors at 50x the speed beats a perfect answer that arrives too late.

Why IVF specifically, and not its famous rival HNSW? Three reasons builders reach for IVF:

Lower memory. IVF stores a flat list of vectors plus a small table of cluster centers. HNSW builds a graph with many links per vector, which can eat several times more RAM. On a large dataset that gap decides whether your index fits in memory at all.
Pairs naturally with compression. IVF combines cleanly with product quantization (the IVFPQ variant) to shrink each vector to a few bytes, letting billions of vectors fit on one machine.
Simple, tunable knobs. IVF exposes exactly two dials — how many buckets to create and how many to search — that map directly to the speed-versus-accuracy tradeoff. That makes its behavior easy to reason about and adjust.

How it works

IVF has a one-time build phase and a per-query search phase. The build phase clusters your vectors; the search phase routes each query to the nearest clusters and scans only those.

Build: train, then add

First you train the index. You pick nlist — the number of clusters (buckets) you want — and run a clustering algorithm (k-means) over a sample of your vectors. K-means finds nlist centroids: representative center points, one per cluster. These centroids are the "aisle signs." The component that holds them and decides which cluster a vector belongs to is called the coarse quantizer — "coarse" because it only sorts vectors into broad buckets, not exact positions.

Then you add your vectors. Each vector is assigned to the cluster whose centroid is closest to it, and appended to that cluster's list (its inverted list). After this, every vector lives in exactly one bucket, and the index knows, for each centroid, which vectors fell under it.

// Build — done once, up front

Pick nliste.g. 4096 clustersTrain (k-means)find nlist centroidsAdd vectorsassign each to nearest centroidInverted listscentroid → its vectors

Search: probe the nearest buckets

When a query arrives, IVF first compares it against the nlist centroids — a cheap step, since there are far fewer centroids than vectors. It picks the nprobe closest centroids, opens only those buckets, and does an exact comparison against the vectors inside them. The best matches from those buckets become your results. Everything in the un-probed clusters is simply never looked at.

// Search — every query

Query vectorthe embedding to matchCompare to centroidsonly nlist of themPick nprobe clustersthe nearest bucketsScan those listsexact compare insideTop-k resultsbest matches found

The whole speedup comes from one number: how many vectors you skip. If you have 10 million vectors split into 4,096 even clusters, each holds about 2,400 vectors. With nprobe = 8 you scan roughly 8 × 2,400 ≈ 19,000 vectors instead of 10,000,000 — under 0.2% of the data. That's the magic, and it's also the risk: if the true best match happens to live in a cluster you didn't probe, you miss it.

The two knobs: nlist and nprobe

Almost all of IVF tuning is balancing these two parameters. They control different things and you set them at different times.

Knob	Set when	What it controls	Turn it up to…
`nlist`	At build / train time	How many clusters exist (bucket count)	Make each bucket smaller, so scanning a bucket is faster — but centroid comparison costs more
`nprobe`	At query time	How many clusters each search opens	Scan more buckets → higher recall, slower search

The crucial point: nprobe is the live accuracy dial. You fix nlist when you build the index, but you can change nprobe on every single query without rebuilding anything. Want more accurate results for an important query? Raise nprobe. Need lower latency? Lower it. At nprobe = nlist you probe every bucket, which is exact search again — slow, but a useful sanity check that your index isn't losing matches for some other reason.

A common rule of thumb for nlist is to scale it with the square root of your vector count (roughly 4 * sqrt(N) is a frequently cited starting point), then tune nprobe empirically against a recall target. There's no universal best value — the right pair depends on your data's shape and how much accuracy you're willing to trade for speed.

// How the knobs trade off

More nprobe

Opens more buckets per query
Higher recall (fewer missed matches)
Slower, more vectors scanned
Changeable per query, no rebuild

More nlist

Splits data into smaller buckets
Each bucket scan is cheaper
But more centroids to compare first
Fixed at build time; rebuild to change

IVF vs HNSW: which to pick

IVF and HNSW are the two ANN index types you'll meet most often, and they take opposite approaches. IVF is cluster-based: it partitions the space and scans nearby partitions. HNSW is graph-based: it builds a navigable network of vectors and walks the graph toward the query. Neither is universally better — they fit different constraints.

	IVF	HNSW
Approach	Cluster the space, probe nearby buckets	Walk a graph of nearest-neighbor links
Training	Required (k-means up front)	None — builds as you insert
Memory use	Low — flat lists + centroids	Higher — many graph links per vector
Main speed knob	`nprobe` (per query)	`ef_search` (per query)
Insert / update	Easy to add; drifts as data shifts	Easy to add; no retraining needed
Compression	Pairs well with PQ (`IVFPQ`)	Harder to compress as effectively
Best when	Huge datasets, tight RAM budget	Need top recall at low latency

A practical way to choose: if RAM is your binding constraint and your dataset is very large, IVF — especially IVFPQ — lets you fit far more vectors per machine. If you want the highest recall at the lowest query latency and can afford the memory, HNSW usually wins, which is why it's the default in many vector databases. Both are approximate; both have a per-query knob; the difference is the data structure underneath and what it costs you in memory.

One more wrinkle on quality. With IVF, an inherent weakness is the cluster boundary problem: if your query sits right between two clusters, the true nearest neighbor might fall just inside a bucket you didn't probe. Raising nprobe covers more boundary cases. HNSW sidesteps this because its graph links cross cluster-like regions, but it pays for that in memory and build complexity.

IVFFlat vs IVFPQ in practice

"IVF" describes how vectors are partitioned. What you store inside each bucket is a separate choice, and it gives you two common variants you'll see named in FAISS and pgvector.

IVFFlat stores the full, uncompressed vectors in each bucket. Search inside a probed bucket is an exact comparison, so recall is high — but you pay full memory for every vector. This is the ivfflat index type in pgvector.
IVFPQ adds product quantization: it compresses each vector into a short code of a few bytes before storing it. Comparisons inside a bucket use the compressed codes, so the index is dramatically smaller (billions of vectors on one box) at the cost of some accuracy from the lossy compression.

Here's the shape of building and searching an IVFFlat index with FAISS. Notice the explicit train step and the per-query nprobe.

ivf_faiss.pypython

import faiss
import numpy as np

d = 768            # embedding dimension
nlist = 4096       # number of clusters (buckets)
N = 1_000_000

# Your corpus of embeddings, L2-normalized for cosine-style search.
vectors = np.random.rand(N, d).astype("float32")

# The coarse quantizer measures distance to each centroid.
quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFFlat(quantizer, d, nlist)

# 1) TRAIN: learn nlist centroids from a representative sample.
index.train(vectors)        # required before adding

# 2) ADD: assign every vector to its nearest centroid's bucket.
index.add(vectors)

# 3) SEARCH: nprobe is the live accuracy/speed dial.
index.nprobe = 16           # open the 16 nearest buckets per query
query = np.random.rand(1, d).astype("float32")
distances, ids = index.search(query, k=5)
print(ids)

# To switch to compressed IVFPQ, build it like this instead:
# m = 64  # bytes per compressed vector
# index = faiss.IndexIVFPQ(quantizer, d, nlist, m, 8)

Going deeper

The basic build-then-probe picture is enough to use IVF well. A few deeper realities matter once you run it in production.

Clusters drift as your data changes. The centroids are frozen at training time. If your data distribution shifts — new topics, new languages, a different mix of documents — the original clusters stop reflecting reality, buckets grow uneven, and recall slowly degrades. Unlike HNSW, IVF doesn't self-adjust. The fix is periodic retraining and reindexing on fresh data, which you should schedule, not wait to discover from falling quality.

Uneven buckets hurt both speed and recall. K-means rarely produces equal-sized clusters; some buckets end up far larger than others. A query routed to a giant bucket scans many vectors (slow), while a query near a sparse region may need a higher nprobe to find enough candidates (lower recall at a fixed setting). Watching the size distribution of your inverted lists is a good health check.

Over-fetch, then re-rank. Because IVF is approximate, a common pattern is to retrieve more candidates than you need (say top-50), then re-rank them with the exact metric or a heavier model to recover quality. This is especially valuable with IVFPQ, where the compressed comparison is rough — you let IVF narrow the field cheaply, then refine the survivors precisely.

It composes with other tricks. IVFPQ is the most common compound form, and FAISS lets you stack further options (residual encoding, an HNSW coarse quantizer to speed up centroid lookup, and so on). You don't need these to start, but it's why IVF scales to billions of vectors where a flat index would be hopeless.

The durable lesson: IVF turns one big exact search into a tiny exact search over a smart shortlist. Everything you tune — nlist, nprobe, the storage variant, the retrain cadence — is just managing the same tradeoff between how much you scan and how often you're right. To zoom back out, see approximate nearest neighbor search for the whole family, or comparing vector databases for how different systems expose these indexes.

FAQ

What is an IVF index in vector search?

IVF (inverted file index) is an approximate nearest neighbor method that clusters your vectors into buckets at build time, then at query time only scans the few buckets nearest the query instead of every vector. This trades a small chance of missing the true best match for search that is often 10–100x faster. It powers FAISS and pgvector's ivfflat.

What do nprobe and nlist mean in IVF?

nlist is the number of clusters (buckets) you create when training the index — set once at build time. nprobe is how many of those buckets each search opens — set per query. Raising nprobe improves recall but slows search, and you can change it on every query without rebuilding the index.

Is IVF better than HNSW?

Neither is universally better. IVF uses less memory and pairs well with compression (IVFPQ), so it suits very large datasets on tight RAM budgets, but it needs training and can drift as data changes. HNSW needs no training and usually gives higher recall at lower latency, but uses more memory. Pick based on your memory budget and recall needs.

What is the difference between IVFFlat and IVFPQ?

Both partition vectors into IVF clusters; the difference is what they store inside each bucket. IVFFlat keeps full uncompressed vectors, giving high accuracy but full memory cost. IVFPQ compresses each vector with product quantization into a few bytes, making the index far smaller at some loss of accuracy — good for fitting billions of vectors.

Does an IVF index need to be trained?

Yes. IVF must be trained before you add data, because training runs k-means to find the cluster centroids it uses to route vectors and queries. This is a key difference from HNSW, which needs no training. Train on a representative sample of your vectors, and retrain periodically if your data distribution shifts over time.

Why might IVF miss the nearest neighbor?

IVF only scans the nprobe clusters closest to the query, so if the true nearest neighbor sits in a bucket you didn't probe — common when a query lands near a cluster boundary — it is missed. Raising nprobe opens more buckets and recovers more of these boundary cases at the cost of slower search.

// In plain English

// Why it matters

// How it works

Build: train, then add

Search: probe the nearest buckets

// The two knobs: nlist and nprobe

// IVF vs HNSW: which to pick

// IVFFlat vs IVFPQ in practice

// Going deeper

// FAQ

// Further reading

// Related