GraphRAG

Build a knowledge graph from your documents and query it with an LLM

github.com/microsoft/graphrag★ 33.9k microsoft.github.io/graphrag

Overview

GraphRAG is a data pipeline from Microsoft Research that turns unstructured text into a structured knowledge graph using an LLM. Instead of only retrieving similar text chunks, it extracts entities and the relationships between them, then groups them into communities so the system can reason across a whole document set rather than one passage at a time.

It is aimed at developers and teams who want to ask broad, sense-making questions over a private collection of documents — for example summarizing the main themes across many files — where plain vector search tends to miss connections that span multiple sources.

As a RAG framework, GraphRAG runs as a command-line workflow: you point it at a folder of text, it indexes that text into graph and Parquet outputs, and then you query it with either a global search (over the whole corpus) or a local search (focused on specific entities). The project is a research demonstration, not an officially supported Microsoft product.

What it does

Extracts a knowledge graph of entities and relationships from raw text using an LLM
Global search answers broad questions across the entire document set
Local search focuses on specific entities and their direct relationships
Command-line workflow: init, index, and query a project folder
Writes indexed results to Parquet files in an output directory for reuse
Prompt Tuning Guide to adapt extraction prompts to your own data

Getting started

Install the package, initialize a project, add your text, index it, then query. GraphRAG needs an OpenAI or Azure API key and can use significant LLM resources, so start with a small dataset.

Install GraphRAG

Install from PyPI. GraphRAG supports Python 3.10–3.12.

bashbash

python -m pip install graphrag

Initialize a project

Create the workspace files. This generates a .env file, a settings.yaml, and an input/ directory. Add your OpenAI or Azure key as GRAPHRAG_API_KEY in the .env file, and drop text files into input/.

bashbash

graphrag init --root ./ragtest

Index your documents

Run the indexing pipeline to build the knowledge graph. Results are written as Parquet files under the output directory. Indexing makes many LLM calls, so begin small.

bashbash

graphrag index --root ./ragtest

Query the graph

Ask a broad question over the whole corpus, or use --method local to focus on a specific entity and its relationships.

bashbash

graphrag query --root ./ragtest "What are the top themes in this story?"

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Summarize the main themes across a large private document collection where vector search alone misses cross-document links
Answer questions that require connecting facts spread over many separate files
Explore entities and their relationships in a corpus, such as people, organizations, and how they connect
Prototype a graph-based RAG approach over narrative or domain text before building a production system

How GraphRAG compares

GraphRAG alongside other open-source rag frameworks & platforms tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Dify	★ 146k	An open-source platform with a visual workflow builder for creating LLM and RAG applications without writing much code.
RAGFlow	★ 83.2k	A RAG engine built around deep document understanding that turns complex files into a grounded, citation-backed question-answering layer.
Context7	★ 57.7k	Context7 pulls current, version-specific documentation and code examples for any library and feeds them into your LLM, available as a CLI skill or an MCP server.
Quivr	★ 39.2k	Quivr is an open-source RAG framework that ingests your documents and answers questions about them, working with any LLM and any file type.
LightRAG	★ 36.8k	A graph-based RAG system that builds an entity-and-relationship knowledge graph for fast retrieval and easy incremental updates.
GraphRAG	★ 33.9k	Build a knowledge graph from your documents and query it with an LLM
PageIndex	★ 33.2k	PageIndex turns long PDFs into a table-of-contents tree and uses LLM reasoning to retrieve relevant sections, with no vector database and no chunking.
FastGPT	★ 28.6k	FastGPT is an open-source AI agent platform that pairs a built-in knowledge base with a drag-and-drop Flow editor, so you can build question-answering apps without heavy setup.

// Overview

// What it does

// Getting started

Install GraphRAG

Initialize a project

Index your documents

Query the graph

// When to use it

// How GraphRAG compares

Overview

What it does

Getting started

When to use it

How GraphRAG compares