AI/TLDR

What Is GraphRAG? Knowledge Graphs Meet Retrieval

You'll understand how GraphRAG builds and queries a knowledge graph from documents, and which question types it answers better than vector RAG.

INTERMEDIATE14 MIN READUPDATED 2026-06-12

In plain English

Imagine two ways to learn everything in a vast library. The first way: you photograph every page, then search by visual similarity when a question arrives — you find pages that look related to your query. This is vector RAG. The second way: a librarian reads every book and draws a giant map connecting every person, place, concept, and event by name — "Einstein worked at Princeton", "Princeton is in New Jersey", "E=mc2 was published in 1905". When you ask a question, the librarian doesn't just look up similar pages; they trace the connections. This is GraphRAG.

GraphRAG (Graph Retrieval-Augmented Generation) is an architecture that first converts your documents into a knowledge graph — a network of entities (people, organizations, concepts, places, products) connected by typed relationships — and then retrieves answers by traversing that graph rather than just matching text chunks. Where vector RAG says "find me the most semantically similar passages", GraphRAG says "find the entities involved and follow their connections".

The approach was formalized by Microsoft Research in the 2024 paper From Local to Global: A Graph RAG Approach to Query-Focused Summarization, which also produced an open-source Python implementation (pip install graphrag) that earned over 20,000 GitHub stars within weeks of its July 2024 release.

Why it matters

Standard vector RAG has a well-understood failure mode: it excels at local, single-hop questions ("What does document X say about Y?") but struggles with questions that require connecting information spread across many documents. The embedding model finds chunks that are locally similar to the query, but has no way to stitch together a chain of facts that spans dozens of source documents.

Consider the kind of questions that break vector RAG:

  • Global sensemaking. "Give me a thematic summary of all the risk factors mentioned across our 500 quarterly reports." No single chunk contains that answer. Vector search returns the most similar passages, but you need the whole picture.
  • Multi-hop relationship questions. "Which of our engineers collaborated with the team that built the authentication module?" Answering requires tracing: person → team → project → module — four hops across entities that live in different documents.
  • Cross-document entity tracking. "How has the position of our CEO on remote work evolved over the last three years?" Every instance where the CEO spoke needs to be found, attributed, and compared — not just the passage most similar to the query string.
  • Implicit connectivity. Two documents never mention each other by name, but both discuss the same undisclosed acquisition target. A graph that links them through shared entities surfaces the connection; vector similarity alone won't.

Research benchmarks confirm the gap: on complex multi-hop tasks, GraphRAG consistently reaches 80-85% accuracy where vector RAG stalls around 45-50%. In compliance and healthcare domains, where queries genuinely require multi-hop traversal, improvements of 3-4x over baseline vector RAG have been documented. Microsoft's original paper showed a 40% reduction in hallucination rate on multi-hop questions versus naive vector RAG on the same corpus.

For builders, the practical signal is: if your vector RAG pipeline scores well on simple lookups in evaluation but collapses on the complex questions your users actually care about, GraphRAG is the standard next step to investigate.

How it works

GraphRAG splits its work across two distinct phases: offline indexing (building the graph from your documents) and online querying (retrieving from it at question time). Both phases are LLM-heavy — this is not a graph database problem alone.

Phase 1: Indexing — building the knowledge graph

The pipeline begins by splitting your source documents into short text units — typically 300 to 600 tokens each, similar to ordinary RAG chunking. Then, for every chunk, an LLM runs entity extraction: it identifies all named entities (people, organizations, locations, products, concepts) and all relationships between them, producing structured output like {"entity": "Alice", "type": "Person"} and {"source": "Alice", "relation": "works_at", "target": "Acme Corp"}. This extraction step is where the LLM token cost concentrates — the default Microsoft GraphRAG configuration makes 4-6 LLM calls per chunk.

Once all entities and relationships are extracted, identical entities are merged across chunks ("Alice" appearing in 50 different documents becomes one node with 50 source edges) and assembled into a single knowledge graph stored as nodes and edges. The graph can be persisted in a graph database like Neo4j or as flat Parquet files, as the open-source implementation does by default.

The final indexing step is community detection. GraphRAG applies the Leiden algorithm — a hierarchical graph-clustering method — to group the entity graph into communities of closely connected nodes. Think of a community as a cluster like "all entities related to our authentication system" or "all people in the European sales team". For each community at each level of the hierarchy, the LLM generates a natural-language community summary report — a paragraph describing what that cluster of entities is about. These summaries are the secret ingredient for answering global questions.

Phase 2: Querying — two different search modes

At query time, Microsoft GraphRAG offers two primary modes depending on the nature of the question:

Local search is for questions about specific, named entities. The system finds the relevant entities in the graph, then fans out to their immediate neighbors and relationships, and also pulls in the original source text chunks associated with those entities. It combines the structured graph data with the unstructured source text to ground the answer — accurate for specific entity questions, and fast.

Global search is for questions that require synthesizing information across the entire corpus — the "thematic overview" and "cross-document" questions vector RAG fails. Rather than matching individual chunks, global search identifies which community summaries are relevant to the question and instructs the LLM to answer using those summaries as context. Because the community summaries were already written to capture the essence of each entity cluster, the model can synthesize a coherent whole-corpus answer without needing to read every source document.

DRIFT search (introduced by Microsoft Research in late 2024) is a hybrid that starts with global community context to orient itself, then zooms in with local entity traversal for specifics — combining global coverage with the precision of entity-level lookup.

GraphRAG vs vector RAG: when each wins

This is fundamentally a question of what kind of questions your users ask. Neither approach is universally better — they occupy different positions on an accuracy-versus-cost tradeoff curve.

DimensionVector RAGGraphRAG
Best question typeSingle-hop, specific lookupMulti-hop, relational, global synthesis
Indexing costLow — only embedding callsHigh — LLM calls per chunk, 20-100x more
Indexing timeMinutes to hoursHours to days for large corpora
Query costLow — one embedding + ANN searchModerate to high — LLM synthesis over summaries
Multi-hop accuracy45-50% on benchmarks80-85% on benchmarks
Global sensemakingWeak — returns similar chunks, not synthesisStrong — community summaries enable corpus-wide answers
Setup complexityLow — days to stand upHigh — weeks to months of ontology work
Update costCheap — re-embed changed chunksExpensive — graph rebuild for entity changes

A practical decision rule: start with vector RAG and measure. Run an evaluation suite that includes both simple lookups and complex multi-hop questions. If your accuracy on simple questions is strong but drops sharply on the complex ones, that gap is exactly what GraphRAG closes. If your traffic is overwhelmingly simple lookups, GraphRAG adds cost for no gain.

The hybrid approach is increasingly standard in production: use vector retrieval for the simple majority of queries (fast, cheap, accurate enough) and layer GraphRAG on top for the specific question types that need relational reasoning. A query classifier routes each incoming question to the appropriate pipeline based on complexity signals — simple factoid questions take the fast vector path; complex relationship and synthesis questions go to GraphRAG.

Tools and ecosystem

GraphRAG has a growing tooling ecosystem centered on three layers: the graph extraction pipeline, the graph database, and the query interface.

Microsoft GraphRAG (open source)

The reference implementation from Microsoft Research is a Python package (pip install graphrag) that handles the full indexing pipeline out of the box — chunking, entity extraction via an LLM, graph construction, Leiden clustering, and community summary generation. It stores the graph as Parquet files and exposes CLI commands for both indexing and querying. Configuration is via a YAML file where you specify your LLM (defaults to GPT-4 via Azure OpenAI or the OpenAI API), chunk sizes, and entity types to extract.

Getting started with Microsoft GraphRAGbash
pip install graphrag

# Initialize a new project
python -m graphrag init --root ./my-project

# Place your .txt/.pdf source files in ./my-project/input/
# Edit ./my-project/settings.yaml to configure your LLM

# Run the full indexing pipeline (this will call the LLM for every chunk)
python -m graphrag index --root ./my-project

# Query with global search (corpus-wide synthesis)
python -m graphrag query --root ./my-project --method global "What are the main themes across all documents?"

# Query with local search (specific entity lookup)
python -m graphrag query --root ./my-project --method local "Who is responsible for the authentication module?"

Neo4j + LangChain / LlamaIndex

Neo4j is the most widely adopted graph database for GraphRAG deployments. Neo4j ships a GraphRAG Python package and a Knowledge Graph Builder — a UI tool that transforms unstructured documents into a Neo4j graph using LLM extraction. LangChain integrates with Neo4j via Neo4jGraph and GraphCypherQAChain, allowing natural-language queries to be translated to Cypher graph queries. LlamaIndex offers a PropertyGraphIndex with a Neo4jPropertyGraphStore backend that implements the GraphRAG community detection and summarization pattern.

Alternative graph databases

FalkorDB markets itself as a GraphRAG-native database with lower indexing costs than Neo4j. ArangoDB provides a hybrid graph + vector store that supports combined GraphRAG queries. For production deployments that need to update frequently, these purpose-built options can reduce the graph rebuild cost that makes standard GraphRAG expensive to maintain.

Going deeper

The cost equation is the main barrier to adoption. Indexing with GraphRAG costs 20-100x more than embedding-only indexing because every source chunk requires multiple LLM calls. For a 1 million token corpus split into 2,000 chunks, you're looking at 8,000-12,000 LLM calls for entity and relationship extraction alone, plus additional calls for community summarization. At GPT-4 pricing, this can run into hundreds of dollars for a medium-sized enterprise corpus. The emerging mitigation: use a cheap, fast model (GPT-4o mini, a fine-tuned local model) for extraction and reserve the stronger model only for community summarization, where output quality matters more.

Graph freshness is the main operational challenge. When your documents update — new reports, modified policies, new people joining — the knowledge graph needs to reflect those changes. Unlike a vector store where you can re-embed changed chunks in isolation, a graph update can require re-running entity deduplication and community detection across the entire graph, which may trigger a full rebuild. Production systems address this with incremental graph updates: new entities are added as delta nodes, community detection is re-run only on the affected subgraph, and community summaries are regenerated only for changed communities. Microsoft GraphRAG's roadmap includes incremental indexing support, and third-party graph databases like FalkorDB have built this into their core offering.

The Leiden algorithm is worth understanding. The Leiden clustering — an improvement on the Louvain algorithm for community detection — operates hierarchically: it detects communities at a coarse level, then recursively sub-divides each community until it reaches leaf clusters that can't be partitioned further. This multi-level hierarchy is what gives GraphRAG its ability to answer questions at different granularities: a very broad question gets answered from top-level community summaries (a few dozen clusters summarizing the whole corpus); a narrower question can drill into a specific sub-community's summary. The depth of the hierarchy is a tunable parameter — deeper hierarchies give finer granularity but cost more to build and store.

Adaptive RAG is where the field is heading. Pure GraphRAG and pure vector RAG are both suboptimal for real production traffic, which is a mix of simple and complex questions. The emerging architecture — sometimes called Adaptive RAG — uses a lightweight query classifier to route each question: simple factoid lookups go to a fast vector pipeline, complex multi-hop or synthesis questions go to GraphRAG. The classifier itself can be a small fine-tuned model or even a rules-based heuristic (presence of relationship keywords, question length, named-entity count). This hybrid pattern gets vector-RAG speed on the easy 80% of queries and GraphRAG accuracy on the hard 20%.

Schema-free vs schema-guided extraction is an active research debate. Microsoft's default GraphRAG uses schema-free extraction: the LLM decides what entities and relationships to pull from each chunk. This is flexible but noisy — the same concept can be extracted under different names across different chunks, requiring post-hoc entity resolution. Schema-guided extraction pre-defines the entity types and relationship types the LLM should look for (e.g., only extract Person, Organization, Product, and EmployedBy, Produces edges). Schema-guided is more precise, reduces deduplication cost, and produces a cleaner graph, but requires upfront ontology design work — exactly the weeks-to-months investment that makes GraphRAG harder to adopt than vector RAG.

FAQ

What is the difference between GraphRAG and regular RAG?

Regular RAG chunks your documents into text segments, embeds them as vectors, and retrieves the most semantically similar chunks at query time. GraphRAG instead builds a knowledge graph from your documents — nodes are entities (people, organizations, concepts), edges are relationships between them — and retrieves by traversing that graph. The key advantage is that GraphRAG can answer questions requiring multi-hop reasoning across many documents, where vector similarity alone fails.

How expensive is GraphRAG to index?

Significantly more expensive than vector embedding. The indexing pipeline calls an LLM multiple times per text chunk to extract entities and relationships, making it 20-100x more expensive than embedding-only indexing. A medium-sized enterprise corpus of around 100,000 chunks can consume millions of LLM tokens and take hours to days to process. Cost reduction strategies include using a smaller model for extraction (GPT-4o mini, a fine-tuned local model) and only using the full model for community summarization.

When should I use GraphRAG instead of vector RAG?

Use GraphRAG when your questions require multi-hop reasoning (tracing chains of relationships), cross-document entity tracking, or global synthesis across an entire corpus — the question types where vector RAG stalls at 45-50% accuracy while GraphRAG reaches 80-85%. Stick with vector RAG for simple single-hop lookups, where it's faster, cheaper, and equally accurate. The practical rule: audit your query logs and only invest in GraphRAG if a significant fraction of your real traffic is in the multi-hop or synthesis categories.

What is the Leiden algorithm and why does GraphRAG use it?

Leiden is a graph community-detection algorithm that groups densely connected nodes into clusters (communities). GraphRAG applies it to partition the entity graph into coherent topic clusters — "all entities related to our Q3 financials", for example. It then generates LLM-written summary reports for each community. These summaries are what power GraphRAG's global search mode, allowing the system to answer corpus-wide questions without reading every source document.

Can I use GraphRAG with models other than GPT-4?

Yes. Microsoft's open-source GraphRAG implementation accepts any OpenAI-compatible API endpoint, so you can substitute GPT-4o, GPT-4o mini, locally hosted models via Ollama, or any other compatible provider by editing the settings.yaml config. Smaller models can be used for entity extraction (lower cost, slight quality reduction) while you reserve a stronger model for community summarization where output quality matters more.

Does GraphRAG replace vector databases?

No. GraphRAG typically complements a vector store rather than replacing it. The local search mode in GraphRAG actually combines graph traversal with vector retrieval — it uses the graph to find relevant entities and their neighbors, then pulls the original source text chunks (stored in a vector or document store) to ground the answer. In hybrid deployments, both a graph database and a vector database run side by side, with the query router deciding which one to invoke.

Further reading