AI/TLDR

What Is Semantic Search? Finding Meaning, Not Keywords

Learn how search engines find 'laptop' when you typed 'notebook computer' — and how to build the same thing with embeddings.

BEGINNER10 MIN READUPDATED 2026-06-11

In plain English

Type "notebook computer" into an old-school search box and it hunts for documents containing those exact words. A page that only says "laptop" — the same thing in different words — gets zero. The search engine isn't matching meaning; it's matching letters. That's keyword search, and it's been the default for decades.

Semantic search matches meaning instead. Ask it for "notebook computer" and it happily returns the page about laptops, because it understands the two phrases point at the same idea. It can find "how do I cancel my plan" when the help article is titled "ending your subscription," with no shared words at all.

Here's the everyday analogy. Imagine a librarian who has read every book in the building. You don't walk up and recite exact title words — you say "I want something about a kid who discovers they're a wizard," and they walk you straight to Harry Potter. They didn't grep the catalog for your phrasing; they understood the gist and matched it to books with the same gist. Semantic search is that librarian, turned into math.

The trick that makes it work is the embedding: a model converts any piece of text into a long list of numbers — a vector — positioned so that texts with similar meaning land near each other. "Laptop" and "notebook computer" end up as nearby points. "Laptop" and "banana split" end up far apart. Search becomes geometry: find the points closest to your question.

Why it matters

Keyword search has a wall it can't climb: it only knows the words you used. Real people don't use the same words as your documents. They type "my card got declined" while your knowledge base says "payment failure." They search "fix slow laptop" while the article is titled "improving system performance." Every one of those is a miss — an empty results page, a frustrated user, a support ticket that didn't need to exist.

The classic patches were brittle. Teams hand-built synonym lists ("car" → "automobile" → "vehicle"), wrote stemming rules, and tuned relevance by trial and error. It never scaled — you can't enumerate every way a human might phrase a thought. Semantic search learns the relationships from data instead of you listing them by hand.

Who should care:

  • Anyone with a search box. Docs sites, help centers, e-commerce catalogs, internal wikis — semantic search turns "no results" into "the thing they actually meant."
  • Anyone building with LLMs. Semantic search is the retrieval half of RAG — it's how a chatbot finds the right three paragraphs from your 10,000-document knowledge base to answer a question accurately.
  • Anyone building recommendations. "More like this" is just semantic search where the query is an item instead of a sentence.

What did it replace? Not entirely keyword search — the two are complementary, and the strongest systems run both (more on that later). What it replaced was the era of manually curated synonym dictionaries and the assumption that users must speak your vocabulary. The burden flipped: the system now meets the user's words, instead of the user guessing yours.

How it works

Semantic search has two phases: a one-time indexing phase where you prepare your documents, and a per-query search phase that runs every time someone asks something.

Indexing (done once, ahead of time). You feed every document through an embedding model, which spits out a vector for each — say, a list of 768 or 1,536 numbers. You store those vectors, paired with the original text, in a vector database or index. This is the expensive part, but you only pay it once per document.

Search (done per query). When a question arrives, you embed it with the same model into a vector, then find the stored vectors closest to it. Those nearest vectors are your most semantically similar documents. Return them, ranked by closeness.

"Closest" needs a definition. The most common measure is cosine similarity: it looks at the angle between two vectors, ignoring their length. Vectors pointing in the same direction (angle near zero) score near 1 — very similar. Vectors at right angles score 0 — unrelated. Pointing opposite ways scores -1. So the whole search reduces to: which stored vectors point in nearly the same direction as the query vector?

There's one catch at scale. Comparing your query against a million stored vectors one by one is slow. So vector databases use approximate nearest neighbor (ANN) indexes — clever data structures like HNSW that find the almost-closest matches in milliseconds by skipping most of the comparisons. You trade a tiny bit of accuracy for an enormous speedup, and at search-engine scale that trade is always worth it.

Build it in Python

Semantic search is shorter to code than most people expect. This example uses sentence-transformers, a popular open-source library that runs an embedding model locally — no API key, no cost. It indexes a handful of documents, then answers a query whose wording shares no words with the right answer.

semantic_search.pypython
# pip install sentence-transformers
from sentence_transformers import SentenceTransformer, util

# A small, fast, well-known embedding model (downloads on first run).
model = SentenceTransformer("all-MiniLM-L6-v2")

documents = [
    "How to end your subscription and stop being billed.",
    "Reset your password from the account settings page.",
    "Our office hours are 9am to 5pm, Monday to Friday.",
    "Troubleshooting a laptop that feels slow or sluggish.",
]

# INDEXING (once): turn every document into a vector.
doc_vectors = model.encode(documents, convert_to_tensor=True)

# SEARCH (per query): no shared words with the cancel doc on purpose.
query = "i want to cancel my plan"
query_vector = model.encode(query, convert_to_tensor=True)

# Cosine similarity of the query against every document.
scores = util.cos_sim(query_vector, doc_vectors)[0]

# Rank documents by score, highest first.
ranked = scores.argsort(descending=True)
for i in ranked:
    print(f"{scores[i]:.3f}  {documents[i]}")

The top hit is the cancellation document, even though the query says "cancel my plan" and the document says "end your subscription" — not one word in common. Keyword search would have returned nothing; semantic search ranks it first because the meanings are close. Swap the four toy documents for thousands of real ones and the only thing that changes is where you store the vectors: you'd move from an in-memory list to a vector database so the nearest-neighbor lookup stays fast.

Common pitfalls

Semantic search is easy to stand up and easy to get subtly wrong. The usual traps:

  • Mismatched models. Embedding documents with one model and queries with another. The vectors aren't in the same space, and results look random. Pin one model everywhere.
  • Chunks too big or too small. Embedding a whole 50-page PDF as one vector blurs every topic together; embedding single sentences loses context. Splitting documents well — see chunking in RAG — often matters more than which model you pick.
  • Expecting exact matches. Asking semantic search for an order number or an exact SKU and being surprised it returns similar ones. That's a keyword job — add hybrid search.
  • Ignoring the language/domain gap. A general model embeds legal jargon or another language weakly. Pick a model trained for your domain or language when it matters.
  • No re-ranking step. The first vector pass is fast but coarse. For quality, fetch the top ~50 and re-score them with a stronger model — that's what a retriever plus reranker does.

Going deeper

The quality of semantic search lives almost entirely in the embedding model. Modern text embedding models are descendants of the Sentence-BERT (SBERT) idea: take a transformer like BERT, fine-tune it with a siamese setup so that sentences with similar meaning are pulled together in vector space and dissimilar ones pushed apart. That training objective — contrastive learning — is what makes the geometry meaningful in the first place. Off-the-shelf BERT vectors, without this step, make a surprisingly poor search index.

Two structural choices define the speed/quality trade-off. A bi-encoder embeds the query and each document separately, so all document vectors can be precomputed and searched in milliseconds — this is what powers first-pass retrieval. A cross-encoder feeds the query and a document together through the model to score them jointly; far more accurate, but it can't precompute anything, so it's too slow to run over a whole corpus. The standard production pattern uses a bi-encoder to fetch a candidate set, then a cross-encoder reranker to reorder just those candidates. Fast recall, then precise ranking.

Scale brings its own engineering. Storing millions of 1,536-dimension float vectors is expensive in RAM, so production systems lean on quantization — compressing vectors to 8-bit integers or even binary codes — trading a little accuracy for big memory savings. The ANN index itself (HNSW, IVF, and friends) has knobs that trade recall against latency; tuning them is a real part of running vector search well. And embeddings drift: re-embedding your whole corpus when you upgrade models is a migration, not a config change, because old and new vectors aren't comparable.

The honest open problems: embeddings can encode the biases of their training data, similarity is not the same as relevance (the closest vector isn't always the best answer for the user's actual intent), and there's no universal best model — the right choice depends on your language, domain, latency budget, and how much you can spend per query. Public leaderboards like MTEB (the Massive Text Embedding Benchmark) help you compare models on standard tasks, but the only benchmark that truly counts is your own data with your own queries. When you wire semantic search into an LLM pipeline, these retrieval choices become the foundation everything else stands on — which is why RAG systems live or die on retrieval quality.

FAQ

What is the difference between semantic search and keyword search?

Keyword search matches the exact words you typed; semantic search matches meaning. Keyword search misses synonyms and rephrasings (it won't find "laptop" when you searched "notebook computer"), while semantic search returns them because it compares the meaning of text using embeddings rather than matching letters. Keyword search still wins for exact codes, IDs, and rare names.

How does semantic search actually work?

An embedding model converts each document into a vector — a list of numbers positioned so that similar meanings land near each other. At search time, your query is turned into a vector with the same model, and the system returns the stored vectors closest to it (usually by cosine similarity). Similar meaning equals nearby vectors, so search becomes a nearest-neighbor lookup.

Do I need a vector database for semantic search?

Not for a few hundred documents — you can hold the vectors in memory and compare them directly, like the Python example here. Once you reach tens of thousands or millions of vectors, a vector database gives you fast approximate-nearest-neighbor search, persistence, and filtering, which an in-memory loop can't provide at that scale.

What are common use cases for semantic search?

Help-center and documentation search, e-commerce product discovery, internal knowledge bases, "more like this" recommendations, finding duplicate or related support tickets, and — most importantly today — the retrieval step in RAG, where it pulls the right context for an LLM to answer questions accurately.

Is semantic search the same as RAG?

No. Semantic search is one component of RAG. RAG (retrieval-augmented generation) uses semantic search to find relevant text, then feeds that text to a large language model to generate an answer. Semantic search returns documents; RAG returns a written response grounded in those documents.

Can I combine semantic and keyword search?

Yes, and you usually should — it's called hybrid search. You run both, then merge the ranked lists (often with Reciprocal Rank Fusion). This gives you the exact-match precision of keyword search and the synonym-and-paraphrase recall of semantic search in a single result set.

Further reading