In plain English
A vector database stores numbers that represent meaning — embeddings — and finds the ones closest to a query. That's it. Choosing one should be simple. Instead it feels like buying a car when every dealer hands you a 60-row spec sheet and insists their model wins.
Here's the trap: those spec sheets compare on dimensions you'll never feel. Index types, distance metrics, recall benchmarks at a billion vectors — all real, all mostly irrelevant when you have 50,000 documents and a deadline. The features that actually decide your choice are boring: how many vectors you have, whether you can run a server, what database you already use, and how much filtering you need.
Think of it like picking where to store boxes. A few dozen boxes go in your closet (a library in your app's memory). A few hundred go in the garage (a table in your existing database). A storage unit across town makes sense only once you've outgrown the garage and need climate control, an inventory system, and someone else handling the locks (a dedicated managed service). Most people rent the storage unit far too early.
Why it matters
Vector search is the retrieval engine behind RAG, semantic search, recommendations, and agent memory. If your app answers questions over documents, suggests similar items, or remembers past conversations, a vector store sits in the hot path of every request. Pick wrong and you pay for it in one of three currencies.
- Money. A dedicated managed vector database can cost real money per month at idle, before you serve a single query. For a prototype with 20,000 chunks, that's pure waste — the same data fits in a Postgres table you're already paying for.
- Operational drag. A self-hosted cluster you have to deploy, monitor, back up, and upgrade is a second database to babysit. That's fine if you have a platform team. It's a tax if you're a solo builder shipping a side project.
- Painful migrations. Outgrow your choice and you re-embed and re-load millions of vectors into a new system, rewrite query code, and re-tune filters. Choosing for today's scale with a clear upgrade path beats over-building for a scale you may never reach.
What did the framework replace? The old default of "everyone uses Pinecone for RAG." That was reasonable when dedicated services were the only mature option. Today the Postgres extension pgvector means many apps never need a separate vector database at all, and strong open-source engines like Qdrant, Weaviate, and Milvus cover the rest. The decision got richer — so a rule of thumb beats a brand name.
How it works
The framework is five questions, answered in order. Each one prunes the field. By the time you reach the end, you're usually staring at one or two real options instead of a dozen.
1. How many vectors, really?
Count your chunks, not your documents — a 50-page PDF might become 200 vectors. This single number eliminates most of the field. Under ~10,000 vectors, a brute-force library in memory is instant and you need no database. Up to a few million, pgvector inside Postgres handles it comfortably. Past tens of millions, or when you need sharding across machines, a dedicated engine earns its keep.
2. Managed service or self-hosted?
Managed means a vendor runs the servers, handles scaling and backups, and bills you — Pinecone is the archetype; Qdrant, Weaviate, and Milvus (as Zilliz Cloud) all offer hosted tiers too. Self-hosted means you deploy the open-source engine on your own infrastructure: cheaper at scale, but it's yours to operate. No platform team and no on-call rotation? Lean managed. Strong infra muscle and cost pressure at scale? Self-host.
3. What's already in your stack?
The cheapest new system is no new system. Already on Postgres? Add pgvector and keep your vectors next to your relational data — one database to back up, JOINs for free, transactions that actually mean something. Already running Elasticsearch or OpenSearch? Both do vector search natively. Reuse beats greenfield almost every time at small and medium scale.
4. How much metadata filtering do you need?
Real queries are rarely "find similar." They're "find similar where user = me, and doc is not archived, and date is this year." That's filtered vector search, and engines differ sharply in how well they combine a metadata filter with the similarity search. Qdrant, Weaviate, and Milvus treat rich filtering as a first-class feature. pgvector leans on Postgres's mature SQL WHERE clauses and indexes. If filtering is central, weight this heavily.
5. Do you need hybrid search?
Hybrid search blends vector similarity with old-school keyword (lexical) search, which catches exact terms — product codes, names, error strings — that embeddings sometimes miss. If your content is full of those, you want a store with built-in hybrid support. Qdrant, Weaviate, Milvus, and Elasticsearch/OpenSearch all offer it; with pgvector you bolt on Postgres full-text search yourself. (See retrieval and reranking for why hybrid often wins.)
The contenders at a glance
Six names cover almost every real decision. Here's what each one actually is and where it shines — no benchmark theater.
| Tool | What it is | Best when |
|---|---|---|
| FAISS | A library (not a DB) from Meta for fast similarity search in memory | Prototyping, research, fully in-process search over a fixed set |
| pgvector | A Postgres extension adding a vector type + ANN indexes | You already use Postgres and have up to a few million vectors |
| Pinecone | A fully-managed, serverless vector database | You want zero ops and will pay a vendor to handle scale |
| Qdrant | Open-source engine (Rust) with strong filtering; cloud tier available | Rich metadata filters, hybrid search, self-host or managed |
| Weaviate | Open-source DB with built-in hybrid search and modules; cloud tier | Hybrid search and an opinionated, batteries-included feel |
| Milvus | Open-source engine built for massive scale; Zilliz Cloud managed | Tens of millions to billions of vectors, distributed clusters |
Note the first row: FAISS is a library, not a database. It has no server, no persistence layer, no metadata filtering, and no network API out of the box — it's the search algorithm other tools wrap. Reaching for FAISS when you need a database, or for Pinecone when an in-memory array would do, is the most common mismatch. Several managed and open-source databases use FAISS-style indexes under the hood.
- Vendor runs the servers
- Scales without your effort
- Predictable but ongoing cost
- Less control, possible lock-in
- Fastest path to production
- You deploy and operate it
- Cheaper per vector at scale
- Full control + portability
- You own backups + upgrades
- Needs infra/ops capacity
A decision in code
You can encode the framework as a function. Feed it your numbers; it returns a recommendation. This isn't gospel — it's a sanity check that forces you to name your real constraints instead of cargo-culting a brand.
def choose_vector_db(
num_vectors: int,
already_using_postgres: bool,
can_self_host: bool,
needs_heavy_filtering: bool,
) -> str:
# Q1: tiny corpus, fixed set -> no database at all
if num_vectors < 10_000:
return "FAISS in memory (no DB needed yet)"
# Q3: reuse the stack you already pay for
if num_vectors < 5_000_000 and already_using_postgres:
return "pgvector (keep vectors next to your data)"
# Q1 again: past a few million, you need a dedicated engine
if num_vectors > 50_000_000:
return "Milvus (built for massive, distributed scale)"
# Q2 + Q4: medium scale -> managed vs self-host
if not can_self_host:
return "Pinecone (managed, zero ops)"
if needs_heavy_filtering:
return "Qdrant (self-host, first-class filtering)"
return "Qdrant or Weaviate (self-host, open-source)"
# A solo dev shipping a RAG app over 80k doc chunks, on Postgres:
print(choose_vector_db(80_000, True, True, False))
# -> pgvector (keep vectors next to your data)The thresholds are deliberately round, not laws of physics — a million here or there won't break anything. What matters is the order of the checks: corpus size and existing stack do the heavy lifting, and the managed-vs-self-host choice only comes up once you've earned a dedicated engine. Start an app from this function and you'll rarely over-provision.
Common pitfalls
- Reaching for a dedicated DB on day one. Most prototypes never outgrow pgvector or even an in-memory index. Adding a separate vector service early buys cost and ops drag you don't need yet.
- Confusing FAISS with a database. No persistence, no filtering, no network API. Great as an engine; wrong as your storage layer if you need any of those.
- Ignoring filtering until launch. "Find similar where tenant = X" is most production queries. If you discover at launch that your store filters poorly, you're migrating. Test filtered queries early.
- Forgetting embedding dimensions are fixed per model. Your vector size is set by your embedding model. Switch models and you re-embed everything — the database choice doesn't save you, so pin the model first.
- Optimizing for a scale you'll never hit. Building for a billion vectors when you have a hundred thousand wastes money and time you could spend on retrieval quality, which matters far more than raw store throughput.
Going deeper
Underneath every vector store is an ANN index — approximate nearest neighbor. Exact search compares your query to every vector, which is accurate but slow at scale. ANN trades a sliver of accuracy for huge speed by searching a clever data structure instead. The dominant one is HNSW (Hierarchical Navigable Small World), a layered graph you hop across to reach close neighbors fast; pgvector, Qdrant, Weaviate, and Milvus all offer it. Older alternatives like IVF (inverted file) partition the space into buckets. The knobs that tune these — ef_search, M, nprobe — trade recall against latency and memory, and they matter once you're past a few million vectors.
Recall is the metric that actually measures index quality: of the truly nearest vectors, what fraction did the index return? An index at 95% recall silently drops 1 in 20 of the best matches — which in RAG can mean the one paragraph that answered the question never reaches the model. The open, vendor-neutral place to see these trade-offs is the ANN-Benchmarks project, which plots recall against queries-per-second across dozens of engines on standard datasets. Read it to understand shapes of trade-offs, not to crown a winner for your workload.
Production concerns the spec sheets bury: memory footprint (HNSW graphs are RAM-hungry; quantization shrinks vectors to fit more per gigabyte at some accuracy cost), filtered-search correctness (naive engines filter after the ANN search and can return too few results — "pre-filtering" vs "post-filtering" is a real and consequential difference), freshness (how fast a newly upserted vector becomes searchable), and multi-tenancy (isolating one customer's vectors from another's without spinning up a database per tenant). These rarely show up in a quickstart and always show up in production.
Finally, the frontier is blurring the categories. Postgres with pgvector keeps closing the gap with dedicated engines for mid-scale workloads; dedicated engines keep adding hybrid search, reranking, and built-in embedding generation to become end-to-end retrieval platforms. The durable advice survives all of it: the database is rarely your retrieval bottleneck. Chunking, embedding quality, filtering, and reranking move accuracy far more than which store you chose — so spend your effort on evaluating retrieval, and treat the database as the swappable, well-understood part of the stack it has become.
FAQ
What is the best vector database for RAG?
There's no single best — it depends on scale and stack. For most RAG apps under a few million chunks, pgvector inside your existing Postgres is the pragmatic default. Need zero operations? Pinecone. Need heavy metadata filtering and hybrid search self-hosted? Qdrant, Weaviate, or Milvus. Match the tool to your constraints, not to a leaderboard.
Pinecone vs pgvector vs Qdrant — which should I pick?
Pinecone is fully managed (you pay a vendor, do zero ops). pgvector lives inside Postgres (best if you already use Postgres and have up to a few million vectors). Qdrant is an open-source engine with first-class filtering and hybrid search, self-hostable or managed. Already on Postgres at modest scale? pgvector. Want no ops? Pinecone. Want control plus rich filtering? Qdrant.
Do I even need a vector database, or can I use a library?
Under roughly 10,000 vectors over a fixed set, a library like FAISS in memory is instant and needs no database. You need a real database once you must persist, update vectors live, filter by metadata, or scale past what fits comfortably in memory. Many prototypes never cross that line.
Is pgvector good enough, or do I need a dedicated vector database?
pgvector is genuinely good enough for a large share of apps — up to several million vectors with solid filtering via SQL WHERE clauses, all inside a database you already operate. Consider a dedicated engine when you pass tens of millions of vectors, need distributed sharding, or require advanced hybrid search that's painful to bolt onto Postgres.
Managed vs self-hosted vector database — how do I decide?
If you have no platform team and want to ship fast, go managed (Pinecone, or the cloud tier of Qdrant, Weaviate, or Milvus) — the vendor handles scaling and backups. If you have infrastructure experience and cost pressure at scale, self-host the open-source engine for lower per-vector cost and full control. The trade is operational effort versus money and lock-in.
Does the choice of vector database affect search accuracy?
Less than people think. Accuracy is driven mostly by your embedding model, chunking, filtering, and reranking — not the store. Databases differ at the margins (ANN recall settings, filtered-search correctness), but a good pipeline on pgvector will out-retrieve a sloppy pipeline on the fanciest engine. Invest in retrieval quality first.