What Is a Vector Database? (and When You Actually Need One)

Get the core mental model: what a vector database stores, the queries it answers, and how it slots into a RAG or search stack.

BEGINNER10 MIN READUPDATED 2026-06-11

In plain English

A vector database is a search engine for meaning. You hand it a piece of text, an image, or a question, and it finds the items in your collection that are most similar in meaning — not the ones that share the same keywords. Ask it "how do I cancel my plan?" and it can surface a help article titled "Ending your subscription," even though the two sentences don't share a single important word.

The everyday analogy is a librarian who has read every book. A normal database is the card catalog: it can find a book only if you know the exact title or author. The well-read librarian is different — you describe what you're after ("something about lonely robots learning to feel"), and they walk you to the right shelf, because they understand what the books are about. A vector database is that librarian, built from math.

The trick that makes this work is an embedding: a model turns each piece of text into a long list of numbers — a vector — that captures its meaning. Similar meanings produce vectors that sit close together in space. The vector database's one job is to store millions of these vectors and, given a new one, find its nearest neighbors fast. That's the whole product: store vectors, find the closest ones.

Why it matters

For decades, search meant matching keywords. Type "car" and you got documents containing the literal string "car" — nothing for "automobile," "vehicle," or "sedan." That worked, but it was brittle: users had to guess the author's exact wording, and synonyms, typos, and paraphrases all slipped through the cracks. Semantic search — powered by vectors — fixes that by matching on meaning instead of spelling.

The bigger reason vector databases exploded recently is LLMs. A language model only knows what it saw during training, and it can only "see" the text you put in its context window right now. You can't paste your company's 10,000-page knowledge base into every prompt. So instead you store all of it as vectors, and at question time you retrieve only the handful of passages that actually answer the question and feed those to the model. That pattern is retrieval-augmented generation, or RAG, and the vector database is its memory.

Who should care:

Anyone building a chatbot over their own docs — the vector database is what lets the bot answer from your content instead of making things up.
Search and recommendation teams — "more like this," duplicate detection, and find-similar features are all nearest-neighbor queries.
AI agents — long-term memory for an agent is usually a vector store it writes to and reads back from across sessions.

What it replaced isn't relational databases — those aren't going anywhere. It replaced the assumption that finding relevant information requires the user and the author to use the same words. Once meaning became a coordinate you could measure distance against, that assumption fell.

How it works

There are two phases. Ingestion happens once, ahead of time: you take every document, run it through an embedding model, and store the resulting vector (plus the original text and any metadata) in the database. Querying happens at request time: you embed the user's question with the same model, then ask the database for the stored vectors closest to it.

// Ingestion: text becomes searchable vectors (done once)

Your documentsdocs, FAQs, ticketsSplit into chunksparagraph-sized piecesEmbedding modeltext → vectorStore in DBvector + text + metadata

"Closest" needs a definition. The database measures distance between vectors with a similarity metric — most commonly cosine similarity, which compares the angle between two vectors and ignores their length. Two vectors pointing the same direction score 1.0 (identical meaning); perpendicular ones score 0.0 (unrelated). Euclidean (L2) distance and the dot product are the other common choices. You pick one when you create the index, and it should match what your embedding model was trained for — most modern text embedders expect cosine.

// Querying: find the nearest neighbors (every request)

User question"how do I cancel?"Same embedding modelquestion → vectorNearest-neighbor searchcompare against all vectorsTop-k resultsthe 3–10 closest chunks

The hard part: doing it fast

Comparing your query against every stored vector one by one (a brute-force or exact search) is simple and perfectly accurate — but it gets slow as your collection grows into the millions. So vector databases use an ANN index (Approximate Nearest Neighbor). An ANN index is a clever data structure that finds almost the closest vectors while checking only a tiny fraction of them. You trade a sliver of accuracy (maybe you miss the 8th-best match occasionally) for search that stays fast at scale. The most popular ANN algorithm is HNSW, which arranges vectors into a navigable graph you can traverse in a few hops.

Vector database vs. relational database

A regular (relational/SQL) database answers questions about exact values: "give me all orders where status = 'shipped' and total > 100." It's built on indexes that find exact matches and ranges instantly. A vector database answers a different question entirely: "give me the items most similar to this one," where similarity is a fuzzy, learned notion of meaning. They're not competitors so much as different tools for different questions.

// Two databases, two kinds of question

Relational (SQL)

Stores rows and columns
Query: exact match + ranges
"status = shipped, total > 100"
Answer is right or wrong
Examples: Postgres, MySQL

Vector database

Stores high-dimensional vectors
Query: nearest neighbors
"most similar to this question"
Answer is ranked by closeness
Examples: Pinecone, Qdrant, Chroma

The lines blur in practice. Postgres can do both with the pgvector extension, which adds a vector column type and ANN indexes to a database you may already run. The decision between "add vectors to my existing database" and "adopt a dedicated vector store" is the central one when you start — and it's covered in how to choose a vector database. Most teams also do metadata filtering: combine a similarity search with an exact filter ("closest chunks where language = 'en' and doc_type = 'manual'"), which is why a vector database stores metadata alongside each vector, not just the raw numbers.

A tiny working example

You don't need a server to see this work. Chroma is an open-source vector database that runs in-process with a few lines of Python — perfect for learning. It even calls a default embedding model for you, so you can store sentences and search them by meaning right away.

first_vector_db.pypython

# pip install chromadb
import chromadb

client = chromadb.Client()                 # in-memory, no server needed
collection = client.create_collection("faq")

# Ingestion: store documents (Chroma embeds them automatically)
collection.add(
    ids=["1", "2", "3"],
    documents=[
        "How to end your subscription and stop billing.",
        "Reset your password from the login screen.",
        "Our refund policy covers the first 30 days.",
    ],
)

# Query: search by MEANING, not keywords
results = collection.query(
    query_texts=["how do I cancel my plan?"],
    n_results=1,
)
print(results["documents"])
# -> [['How to end your subscription and stop billing.']]

Notice the magic: the query "how do I cancel my plan?" shares zero meaningful words with "end your subscription," yet it's the top hit. A keyword search would have returned nothing. Behind the scenes Chroma embedded all three documents at add() time, embedded your question at query() time, and returned the nearest neighbor by cosine distance. That is a complete vector-search pipeline in fifteen lines — production systems are the same shape with a real server, millions of vectors, and metadata filters bolted on.

The tool landscape

The space looks crowded, but the options sort into a few buckets. The right pick depends on whether you already run Postgres, how many vectors you have, and whether you want to manage infrastructure yourself.

Tool	Type	Good first reach-for when…
pgvector	Postgres extension	You already use Postgres and want one database for everything
Chroma	Open-source, embeddable	Prototyping or a local RAG demo; runs in-process
Qdrant	Open-source + cloud	You want a dedicated store with strong metadata filtering
Weaviate	Open-source + cloud	You want built-in modules and hybrid search out of the box
Milvus	Open-source, scale-out	Billions of vectors and a distributed deployment
Pinecone	Fully managed cloud	You want zero infrastructure to run and operate
FAISS	Library, not a database	You need a raw, fast ANN index inside your own app code

One distinction worth internalizing early: *FAISS is a library, not a database.* It's a blazing-fast ANN index from Meta that you embed in your code, but it doesn't handle storage, metadata, updates, or persistence on its own — you build those around it. Many "vector databases" actually use FAISS or HNSW under the hood and add the database parts (an API, durability, filtering, scaling) on top. When someone says "just use FAISS," they mean the search algorithm, not a full product.

Going deeper

Everything above is the mental model. Here's what bites once you're running one in production. The headline tension is the accuracy-speed-memory triangle of the ANN index. HNSW, the default in most stores, holds the whole graph in RAM and is very fast and accurate — but a million 1,536-dimension float32 vectors is roughly 6 GB before the graph overhead, so memory is the real cost driver, not CPU. Two knobs control its trade-off: ef_construction (how thoroughly the graph is built) and ef_search (how widely you explore at query time). Turn them up for recall, down for speed.

// What's inside a production vector store

Query API + filteringsearch, metadata filters, top-kANN indexHNSW or IVF graphVector storageraw vectors, maybe compressedMetadata + original textwhat you actually return

A few threads that matter as you scale. Quantization shrinks vectors — storing each number as one byte instead of four, or using binary and product quantization — cutting memory four to thirty times for a small recall hit; it's how billion-vector indexes stay affordable. IVF (inverted file) is the other big index family: it clusters vectors into buckets and searches only the nearest few buckets, trading some recall for far less memory than HNSW. Hybrid search combines vector similarity with old-school keyword search (BM25) and fuses the rankings, because pure semantic search can miss exact identifiers like part numbers or error codes that keyword search nails.

The honest open problems: keeping an ANN index fresh as documents change is harder than it sounds — most indexes favor batch rebuilds over constant updates, so high-churn data needs care. Filtered search (similarity plus a strict metadata condition) can quietly wreck recall if the index and filter fight each other, which is why mature stores invest heavily in it. And similarity search is only the first stage of good retrieval — production retrieval pipelines usually pull a generous top-50 from the vector database and then rerank them with a slower, sharper model before handing the best few to the LLM. The vector database gets you candidates; turning candidates into the right answer is the rest of the RAG stack.

FAQ

What is a vector database used for?

Mainly semantic search and the retrieval step of RAG — finding the documents most similar in meaning to a query so a chatbot can answer from your own content. It also powers recommendations ("more like this"), duplicate and near-duplicate detection, image and audio similarity search, and long-term memory for AI agents.

What's the difference between a vector database and a regular database?

A regular (SQL) database answers exact-match and range questions — "find rows where status = shipped." A vector database answers similarity questions — "find the items most similar in meaning to this one," ranked by distance. They solve different problems, and tools like Postgres with the pgvector extension can do both at once.

Do I actually need a vector database for RAG?

Not always. For a few thousand documents, a simple library like FAISS or an in-memory index, or even pgvector on a database you already run, is plenty. You reach for a dedicated vector database when you have many millions of vectors, need metadata filtering at scale, or want managed infrastructure with high availability.

How does a vector database find similar items so fast?

It uses an Approximate Nearest Neighbor (ANN) index, most often HNSW, which arranges vectors into a navigable graph so a search visits only a tiny fraction of them. This trades a small amount of accuracy for search that stays fast as the collection grows into the millions, instead of comparing against every vector one by one.

Can I just use Postgres instead of a dedicated vector database?

Often, yes. The pgvector extension adds a vector column type and ANN indexes to PostgreSQL, so you keep your vectors, relational data, and SQL filters in one place. It's a great default until you hit very large scale or need features a specialized store provides — see the decision framework on choosing a vector database.

// In plain English

// Why it matters

// How it works

The hard part: doing it fast

// Vector database vs. relational database

// A tiny working example

// The tool landscape

// Going deeper

// FAQ

// Further reading

// Related