AI/TLDR

What Is pgvector? Vector Search Inside PostgreSQL

Learn how a single Postgres extension gives you production vector search right next to the data you already have.

BEGINNER12 MIN READUPDATED 2026-06-12

In plain English

pgvector is a free, open-source extension for PostgreSQL that turns your existing Postgres database into a vector search engine. Once installed, you can store embeddings in ordinary table columns and ask questions like "find me the ten rows most semantically similar to this query" using a single SQL statement — no new infrastructure required.

The analogy that makes this click: imagine your Postgres database is a filing cabinet. Normally the cabinet sorts by date, alphabetically, or by ID — it can only find a folder if you know its exact label. pgvector adds a smart drawer that sorts by meaning. Slide in a piece of text and the drawer returns the folders whose contents are closest in topic, even if none of them use your exact words. That "closest in meaning" lookup is a nearest-neighbor vector search, and pgvector wires it directly into the database you already operate.

In practice it works like this: an embedding model (say, OpenAI's text-embedding-3-small or a local model via Ollama) turns each document into a vector — a long array of numbers like [0.12, -0.55, 0.03, ...] with 384 to 3,072 values. pgvector stores those arrays in a vector column, builds an index over them, and exposes three distance operators so you can ORDER BY similarity in plain SQL.

Why it matters

Before pgvector, adding semantic search to a Postgres-backed app meant introducing a second data store: something like Pinecone, Weaviate, or Qdrant running alongside your relational database. That creates a split-brain problem: your canonical data lives in Postgres but your search index lives somewhere else, and you have to keep them in sync, manage two connection pools, handle failures in two systems, and write code that joins data across network boundaries.

pgvector collapses that complexity. Your embeddings live in the same table as the document text, user ID, and created-at timestamp. A RAG query that retrieves the five most relevant documents for a user, filtered to content they have permission to see, is a single SQL query — no round-trip to an external service, no stale-sync bugs, full ACID guarantees.

Who should reach for it

  • RAG applications — store document chunks and their embeddings side by side; retrieve context with a LIMIT 5 ORDER BY embedding <=> $query_vec query.
  • Semantic search over your own content — support tickets, knowledge-base articles, product catalog descriptions, job postings.
  • Recommendation features — "more like this" on products, posts, or profiles.
  • Duplicate / near-duplicate detection — cluster or flag items that embed close together.
  • Teams already operating Postgres — zero new infrastructure, one fewer on-call rotation.

For datasets under roughly 5–10 million vectors on appropriately sized hardware, pgvector delivers query latency under 20 ms with recall rates above 95% — performance that is indistinguishable from most dedicated vector databases at a fraction of the operational complexity.

How it works

pgvector adds four new data types, three distance operators, and two index algorithms to Postgres. Understanding these three pieces gives you a complete mental model.

Step 1 — Enable the extension and add a column

sqlsql
-- Install once per database
CREATE EXTENSION IF NOT EXISTS vector;

-- Add an embedding column to an existing table
ALTER TABLE documents ADD COLUMN embedding vector(1536);

-- Or create a new table with embeddings
CREATE TABLE documents (
  id        BIGSERIAL PRIMARY KEY,
  content   TEXT,
  embedding vector(1536)
);

The 1536 is the dimension count — the length of the arrays you will store. This must match your embedding model's output dimension (OpenAI's text-embedding-3-small = 1,536; text-embedding-3-large = 3,072; many open-source models = 384 or 768).

Step 2 — Insert embeddings

sqlsql
-- Insert a document with its embedding (values abbreviated)
INSERT INTO documents (content, embedding)
VALUES ('pgvector adds vector search to Postgres', '[0.021, -0.55, 0.13, ...]');

-- Or from Python with the psycopg2 / asyncpg adapter
-- embedding = openai.embeddings.create(...).data[0].embedding
-- cursor.execute("INSERT INTO documents (content, embedding) VALUES (%s, %s)", (text, embedding))

Step 3 — Search with a distance operator

pgvector exposes three operators for similarity queries:

OperatorDistance metricBest for
<->L2 (Euclidean) distanceGeometry, image embeddings where magnitude matters
<=>Cosine distanceText embeddings from most language models (default recommendation)
<#>Negative inner productVectors already normalized to unit length
sqlsql
-- Return the 5 documents most similar to a query vector
SELECT id, content, 1 - (embedding <=> '[0.021, -0.55, 0.13, ...]') AS similarity
FROM documents
ORDER BY embedding <=> '[0.021, -0.55, 0.13, ...]'
LIMIT 5;

-- Combine with a regular filter (SQL query planner handles both)
SELECT id, content
FROM documents
WHERE user_id = 42
ORDER BY embedding <=> $1
LIMIT 5;

Step 4 — Add an index for speed

Without an index, pgvector performs an exact sequential scan — it computes the distance to every row. This is fine up to tens of thousands of rows but becomes slow at larger scales. Two index types are available: HNSW and IVFFlat.

HNSW (Hierarchical Navigable Small World) builds a multi-layer graph over your vectors. The top layers are sparse long-range connections; the bottom layer is dense. A query enters at the top, greedily walks toward the nearest node at each hop, then descends until it reaches the bottom layer and collects the final nearest neighbors. This approach provides excellent recall (typically 95–99%) with sub-millisecond traversal on millions of vectors.

sqlsql
-- Build an HNSW index for cosine similarity (recommended default)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

-- Tune build parameters for higher recall (at cost of build time + memory)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);
-- m: number of connections per node (default 16)
-- ef_construction: candidate list size during build (default 64)

Vector types and distance metrics

pgvector 0.7.0 (April 2024) added three new data types beyond the original vector type, each targeting different memory and dimensionality trade-offs:

TypePrecisionMax indexable dimensionsUse case
vector32-bit float (full)2,000 (HNSW) / 2,000 (IVFFlat)Standard embeddings — the default choice
halfvec16-bit float (half)4,000Cut storage and memory roughly in half with minimal recall loss
sparsevecSparse (nonzero only)1,000 nonzero dimsBM25-style or SPLADE sparse embeddings
bitBinary (1 bit/dim)64,000Binary quantization for extreme speed/storage savings

For typical RAG work with models like OpenAI's text-embedding-3-small (1,536 dims) or nomic-embed-text (768 dims), the standard vector type is the right starting point. Switch to halfvec when your index is eating too much RAM — the precision drop is usually imperceptible for semantic search.

Choosing the right distance metric

The most important rule: use the same metric your embedding model was trained with. Most modern text embedding models (OpenAI, Cohere, Sentence-Transformers, etc.) are trained to be compared via cosine similarity, so use vector_cosine_ops for your index and <=> for your queries. If your model documentation says "use dot product", use vector_ip_ops and <#>. Mixing metrics silently returns wrong results.

pgvector vs dedicated vector databases

The question "should I use pgvector or Pinecone/Qdrant/Weaviate?" comes down to data volume, query latency requirements, and how much you value operational simplicity. Here is an honest comparison based on 2025 benchmarks and production reports.

DimensionpgvectorDedicated (e.g. Pinecone, Qdrant)
Setup costZero if you already run PostgresNew service, new billing, new ops
Data consistencyFull ACID — vectors and metadata in one transactionEventual sync between two systems
Filtered searchPostgres query planner handles filters + vector togetherFilters can hurt recall; varies by product
Scale ceilingPractical limit ~5–10M vectors per instanceDesigned for hundreds of millions to billions
Query latency (<1M vecs)Under 20 ms with HNSW, 95%+ recallSimilar or slightly better
Cost at scale~75% cheaper than Pinecone s1 tier (Timescale benchmark)Higher but includes managed horizontal scaling
Operational burdenLow — it is just PostgresNear-zero for managed tiers; higher for self-hosted

Decision guide

Start with pgvector if your dataset is under five million vectors and your team already operates Postgres. The unified schema means one fewer service, one fewer failure domain, and zero data-sync headaches. Confident AI publicly documented replacing Pinecone with pgvector and noted that the switch eliminated an entire class of bugs caused by out-of-sync states between their relational and vector stores.

Switch to a dedicated vector database when you regularly exceed 10–50 million vectors, when you need horizontal sharding, or when you require multi-region replication with sub-10 ms global latency. At that scale, Pinecone's serverless tier or Qdrant's distributed mode genuinely outperform a single Postgres instance. The crossover point is not one size fits all — run your own benchmark with realistic data volumes and query patterns.

Going deeper

Once you have a working pgvector setup, these advanced topics are where most production systems need tuning.

HNSW tuning parameters

Two build-time parameters control the HNSW index's recall-vs-speed tradeoff: m (the number of bidirectional links per node, default 16) and ef_construction (the candidate list size during index build, default 64). Higher values improve recall but increase build time and index size. At query time, hnsw.ef_search (default 40) controls how many candidates are explored — raise it for higher recall at the cost of slower queries.

sqlsql
-- Raise ef_search for a single session (more recall, slightly slower)
SET hnsw.ef_search = 100;

-- Check actual recall by comparing approximate vs exact results
SELECT COUNT(*) FROM (
  SELECT id FROM documents ORDER BY embedding <=> $1 LIMIT 10
) approximate
WHERE id IN (
  SELECT id FROM documents ORDER BY embedding <-> $1 LIMIT 10  -- exact scan
);

Iterative index scans (pgvector 0.8.0+)

pgvector 0.8.0, released October 2024, added iterative index scans. The problem it solves: when a query combines vector search with a highly selective WHERE clause, the initial approximate scan might not return enough rows that pass the filter. Previously you would miss results. With iterative scanning enabled, pgvector automatically expands the search until it has enough qualifying rows or hits a configurable limit. Enable it with SET hnsw.iterative_scan = relaxed_order;.

Hybrid search: combining BM25 + vector

Pure vector search misses exact keyword matches — a query for a product SKU like "RTX-4090" should surface results lexically, not semantically. The industry solution is hybrid search: combine a BM25 (keyword) score with a vector similarity score, then re-rank. In Postgres you can use the pg_trgm or tsvector full-text search facilities alongside pgvector and merge scores with Reciprocal Rank Fusion (RRF). Frameworks like LangChain and LlamaIndex offer Postgres-native hybrid search built on this pattern.

pgvectorscale and DiskANN

Timescale released pgvectorscale, a companion extension that implements StreamingDiskANN — a disk-based index that does not require fitting the entire index in RAM. Benchmarks show 28x lower p95 latency compared to Pinecone's s1 storage-optimized tier at 99% recall, and roughly 75% lower cost. If you need to push past pgvector's memory constraints without migrating to a dedicated database, pgvectorscale is the next step up.

Production checklist

  • Build the HNSW index before you query — without an index, every query is a full sequential scan.
  • Match the operator class to your metricvector_cosine_ops for cosine, vector_l2_ops for L2; mismatch silently degrades results.
  • Normalize vectors before insert if using inner product (<#>) to avoid scale artifacts.
  • Benchmark with your real data — recall degrades differently depending on embedding distribution; synthetic benchmarks can mislead.
  • Monitor pg_stat_user_indexes to confirm the index is actually being hit in your queries.
  • Consider halfvec if RAM is tight — halving the storage footprint of a 1,536-dim embedding index frees significant memory with negligible recall loss.

FAQ

Do I need to install anything special to use pgvector?

On most managed Postgres providers (AWS RDS, Aurora, Supabase, Neon, Google Cloud SQL) pgvector is already available — just run CREATE EXTENSION vector; in your database. For self-hosted Postgres you install the extension from the pgvector GitHub repository or via your distro's package manager, then run the same command.

What is the maximum number of dimensions pgvector supports?

The standard vector type supports up to 16,000 dimensions for storage but the HNSW and IVFFlat indexes are limited to 2,000 dimensions. The halfvec type indexes up to 4,000 dimensions, and the bit binary type indexes up to 64,000 dimensions. For most text embedding models (384–3,072 dims) the standard vector type with HNSW indexing works fine.

How does pgvector's HNSW index compare to IVFFlat?

HNSW generally provides better query performance (higher recall at lower latency) and does not require a training step, so you can build it on an empty table. IVFFlat builds faster and uses less memory, but requires data to already be in the table for the training step and performs worse when data distribution changes. HNSW is the recommended default for most workloads.

Can pgvector handle hybrid search (keyword + semantic)?

Yes, because it runs inside Postgres. You can combine pgvector similarity queries with Postgres full-text search (tsvector/tsquery or pg_trgm) in a single SQL query, then merge scores using Reciprocal Rank Fusion. Many AI frameworks including LangChain and LlamaIndex have built-in Postgres hybrid search support that uses exactly this approach.

When should I use pgvector instead of Pinecone or Qdrant?

Use pgvector when your dataset is under 5–10 million vectors and your team already operates Postgres. The big win is a unified schema — no sync between a relational database and a separate vector store. Switch to a dedicated vector database when you need billions of vectors, horizontal sharding, or sub-10 ms multi-region latency at sustained high query-per-second rates.

Does pgvector slow down my existing Postgres queries?

Installing the extension has no effect on non-vector queries. Adding a vector column to an existing table is an ordinary schema change. The HNSW index only affects write throughput during bulk inserts slightly — it builds the graph incrementally. Reads on non-vector columns are completely unaffected.

Further reading