In plain English
A vector database stores your data as embeddings — long lists of numbers where similar items sit close together. To answer "find the items most like this query," the simplest method compares the query against every stored vector. That's exact, but with ten million vectors it's painfully slow. An IVF index is one of the classic tricks for skipping most of that work.

IVF stands for inverted file index. The idea is to sort your vectors into buckets ahead of time, then at search time only open the few buckets nearest your query. You trade a tiny chance of missing the true best match for a search that's often 10–100x faster.
Think of a giant grocery warehouse. Without organization, finding peanut butter means walking past every single product. Instead the warehouse groups items into aisles — dairy, bakery, snacks. When you want peanut butter you walk straight to the snacks aisle and ignore the rest of the building. IVF does exactly this for vectors: it carves the space into clusters (the aisles), figures out which clusters your query belongs near, and searches only inside those.
Why it matters
The core problem IVF solves is the cost of brute-force search. Comparing a query to every vector is called a flat or exact search. It always returns the true nearest neighbors, but its cost grows linearly with the number of vectors — double your data, double your search time. At a few thousand vectors nobody notices. At tens of millions, every query becomes too slow and too expensive to serve in real time.
IVF belongs to the family of approximate nearest neighbor (ANN) methods. "Approximate" is the key word: instead of guaranteeing the perfect top results, it returns almost always the right results while scanning a small slice of your data. For semantic search, recommendations, and RAG retrieval, that tradeoff is almost always worth it — a 99% chance of the correct neighbors at 50x the speed beats a perfect answer that arrives too late.
Why IVF specifically, and not its famous rival HNSW? Three reasons builders reach for IVF:
- Lower memory. IVF stores a flat list of vectors plus a small table of cluster centers. HNSW builds a graph with many links per vector, which can eat several times more RAM. On a large dataset that gap decides whether your index fits in memory at all.
- Pairs naturally with compression. IVF combines cleanly with product quantization (the
IVFPQvariant) to shrink each vector to a few bytes, letting billions of vectors fit on one machine. - Simple, tunable knobs. IVF exposes exactly two dials — how many buckets to create and how many to search — that map directly to the speed-versus-accuracy tradeoff. That makes its behavior easy to reason about and adjust.
How it works
IVF has a one-time build phase and a per-query search phase. The build phase clusters your vectors; the search phase routes each query to the nearest clusters and scans only those.
Build: train, then add
First you train the index. You pick nlist — the number of clusters (buckets) you want — and run a clustering algorithm (k-means) over a sample of your vectors. K-means finds nlist centroids: representative center points, one per cluster. These centroids are the "aisle signs." The component that holds them and decides which cluster a vector belongs to is called the coarse quantizer — "coarse" because it only sorts vectors into broad buckets, not exact positions.
Then you add your vectors. Each vector is assigned to the cluster whose centroid is closest to it, and appended to that cluster's list (its inverted list). After this, every vector lives in exactly one bucket, and the index knows, for each centroid, which vectors fell under it.
Search: probe the nearest buckets
When a query arrives, IVF first compares it against the nlist centroids — a cheap step, since there are far fewer centroids than vectors. It picks the nprobe closest centroids, opens only those buckets, and does an exact comparison against the vectors inside them. The best matches from those buckets become your results. Everything in the un-probed clusters is simply never looked at.
The whole speedup comes from one number: how many vectors you skip. If you have 10 million vectors split into 4,096 even clusters, each holds about 2,400 vectors. With nprobe = 8 you scan roughly 8 × 2,400 ≈ 19,000 vectors instead of 10,000,000 — under 0.2% of the data. That's the magic, and it's also the risk: if the true best match happens to live in a cluster you didn't probe, you miss it.
The two knobs: nlist and nprobe
Almost all of IVF tuning is balancing these two parameters. They control different things and you set them at different times.
| Knob | Set when | What it controls | Turn it up to… |
|---|---|---|---|
nlist | At build / train time | How many clusters exist (bucket count) | Make each bucket smaller, so scanning a bucket is faster — but centroid comparison costs more |
nprobe | At query time | How many clusters each search opens | Scan more buckets → higher recall, slower search |
The crucial point: nprobe is the live accuracy dial. You fix nlist when you build the index, but you can change nprobe on every single query without rebuilding anything. Want more accurate results for an important query? Raise nprobe. Need lower latency? Lower it. At nprobe = nlist you probe every bucket, which is exact search again — slow, but a useful sanity check that your index isn't losing matches for some other reason.
A common rule of thumb for nlist is to scale it with the square root of your vector count (roughly 4 * sqrt(N) is a frequently cited starting point), then tune nprobe empirically against a recall target. There's no universal best value — the right pair depends on your data's shape and how much accuracy you're willing to trade for speed.
- Opens more buckets per query
- Higher recall (fewer missed matches)
- Slower, more vectors scanned
- Changeable per query, no rebuild
- Splits data into smaller buckets
- Each bucket scan is cheaper
- But more centroids to compare first
- Fixed at build time; rebuild to change
IVF vs HNSW: which to pick
IVF and HNSW are the two ANN index types you'll meet most often, and they take opposite approaches. IVF is cluster-based: it partitions the space and scans nearby partitions. HNSW is graph-based: it builds a navigable network of vectors and walks the graph toward the query. Neither is universally better — they fit different constraints.
| IVF | HNSW | |
|---|---|---|
| Approach | Cluster the space, probe nearby buckets | Walk a graph of nearest-neighbor links |
| Training | Required (k-means up front) | None — builds as you insert |
| Memory use | Low — flat lists + centroids | Higher — many graph links per vector |
| Main speed knob | nprobe (per query) | ef_search (per query) |
| Insert / update | Easy to add; drifts as data shifts | Easy to add; no retraining needed |
| Compression | Pairs well with PQ (IVFPQ) | Harder to compress as effectively |
| Best when | Huge datasets, tight RAM budget | Need top recall at low latency |
A practical way to choose: if RAM is your binding constraint and your dataset is very large, IVF — especially IVFPQ — lets you fit far more vectors per machine. If you want the highest recall at the lowest query latency and can afford the memory, HNSW usually wins, which is why it's the default in many vector databases. Both are approximate; both have a per-query knob; the difference is the data structure underneath and what it costs you in memory.
One more wrinkle on quality. With IVF, an inherent weakness is the cluster boundary problem: if your query sits right between two clusters, the true nearest neighbor might fall just inside a bucket you didn't probe. Raising nprobe covers more boundary cases. HNSW sidesteps this because its graph links cross cluster-like regions, but it pays for that in memory and build complexity.
IVFFlat vs IVFPQ in practice
"IVF" describes how vectors are partitioned. What you store inside each bucket is a separate choice, and it gives you two common variants you'll see named in FAISS and pgvector.
- IVFFlat stores the full, uncompressed vectors in each bucket. Search inside a probed bucket is an exact comparison, so recall is high — but you pay full memory for every vector. This is the
ivfflatindex type in pgvector. - IVFPQ adds product quantization: it compresses each vector into a short code of a few bytes before storing it. Comparisons inside a bucket use the compressed codes, so the index is dramatically smaller (billions of vectors on one box) at the cost of some accuracy from the lossy compression.
Here's the shape of building and searching an IVFFlat index with FAISS. Notice the explicit train step and the per-query nprobe.
import faiss
import numpy as np
d = 768 # embedding dimension
nlist = 4096 # number of clusters (buckets)
N = 1_000_000
# Your corpus of embeddings, L2-normalized for cosine-style search.
vectors = np.random.rand(N, d).astype("float32")
# The coarse quantizer measures distance to each centroid.
quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFFlat(quantizer, d, nlist)
# 1) TRAIN: learn nlist centroids from a representative sample.
index.train(vectors) # required before adding
# 2) ADD: assign every vector to its nearest centroid's bucket.
index.add(vectors)
# 3) SEARCH: nprobe is the live accuracy/speed dial.
index.nprobe = 16 # open the 16 nearest buckets per query
query = np.random.rand(1, d).astype("float32")
distances, ids = index.search(query, k=5)
print(ids)
# To switch to compressed IVFPQ, build it like this instead:
# m = 64 # bytes per compressed vector
# index = faiss.IndexIVFPQ(quantizer, d, nlist, m, 8)Going deeper
The basic build-then-probe picture is enough to use IVF well. A few deeper realities matter once you run it in production.
Clusters drift as your data changes. The centroids are frozen at training time. If your data distribution shifts — new topics, new languages, a different mix of documents — the original clusters stop reflecting reality, buckets grow uneven, and recall slowly degrades. Unlike HNSW, IVF doesn't self-adjust. The fix is periodic retraining and reindexing on fresh data, which you should schedule, not wait to discover from falling quality.
Uneven buckets hurt both speed and recall. K-means rarely produces equal-sized clusters; some buckets end up far larger than others. A query routed to a giant bucket scans many vectors (slow), while a query near a sparse region may need a higher nprobe to find enough candidates (lower recall at a fixed setting). Watching the size distribution of your inverted lists is a good health check.
Over-fetch, then re-rank. Because IVF is approximate, a common pattern is to retrieve more candidates than you need (say top-50), then re-rank them with the exact metric or a heavier model to recover quality. This is especially valuable with IVFPQ, where the compressed comparison is rough — you let IVF narrow the field cheaply, then refine the survivors precisely.
It composes with other tricks. IVFPQ is the most common compound form, and FAISS lets you stack further options (residual encoding, an HNSW coarse quantizer to speed up centroid lookup, and so on). You don't need these to start, but it's why IVF scales to billions of vectors where a flat index would be hopeless.
The durable lesson: IVF turns one big exact search into a tiny exact search over a smart shortlist. Everything you tune — nlist, nprobe, the storage variant, the retrain cadence — is just managing the same tradeoff between how much you scan and how often you're right. To zoom back out, see approximate nearest neighbor search for the whole family, or comparing vector databases for how different systems expose these indexes.
FAQ
What is an IVF index in vector search?
IVF (inverted file index) is an approximate nearest neighbor method that clusters your vectors into buckets at build time, then at query time only scans the few buckets nearest the query instead of every vector. This trades a small chance of missing the true best match for search that is often 10–100x faster. It powers FAISS and pgvector's ivfflat.
What do nprobe and nlist mean in IVF?
nlist is the number of clusters (buckets) you create when training the index — set once at build time. nprobe is how many of those buckets each search opens — set per query. Raising nprobe improves recall but slows search, and you can change it on every query without rebuilding the index.
Is IVF better than HNSW?
Neither is universally better. IVF uses less memory and pairs well with compression (IVFPQ), so it suits very large datasets on tight RAM budgets, but it needs training and can drift as data changes. HNSW needs no training and usually gives higher recall at lower latency, but uses more memory. Pick based on your memory budget and recall needs.
What is the difference between IVFFlat and IVFPQ?
Both partition vectors into IVF clusters; the difference is what they store inside each bucket. IVFFlat keeps full uncompressed vectors, giving high accuracy but full memory cost. IVFPQ compresses each vector with product quantization into a few bytes, making the index far smaller at some loss of accuracy — good for fitting billions of vectors.
Does an IVF index need to be trained?
Yes. IVF must be trained before you add data, because training runs k-means to find the cluster centroids it uses to route vectors and queries. This is a key difference from HNSW, which needs no training. Train on a representative sample of your vectors, and retrain periodically if your data distribution shifts over time.
Why might IVF miss the nearest neighbor?
IVF only scans the nprobe clusters closest to the query, so if the true nearest neighbor sits in a bucket you didn't probe — common when a query lands near a cluster boundary — it is missed. Raising nprobe opens more buckets and recovers more of these boundary cases at the cost of slower search.