AI/TLDR

GraphRAG

Build a knowledge graph from your documents and query it with an LLM

Overview

GraphRAG is a data pipeline from Microsoft Research that turns unstructured text into a structured knowledge graph using an LLM. Instead of only retrieving similar text chunks, it extracts entities and the relationships between them, then groups them into communities so the system can reason across a whole document set rather than one passage at a time.

It is aimed at developers and teams who want to ask broad, sense-making questions over a private collection of documents — for example summarizing the main themes across many files — where plain vector search tends to miss connections that span multiple sources.

As a RAG framework, GraphRAG runs as a command-line workflow: you point it at a folder of text, it indexes that text into graph and Parquet outputs, and then you query it with either a global search (over the whole corpus) or a local search (focused on specific entities). The project is a research demonstration, not an officially supported Microsoft product.

What it does

  • Extracts a knowledge graph of entities and relationships from raw text using an LLM
  • Global search answers broad questions across the entire document set
  • Local search focuses on specific entities and their direct relationships
  • Command-line workflow: init, index, and query a project folder
  • Writes indexed results to Parquet files in an output directory for reuse
  • Prompt Tuning Guide to adapt extraction prompts to your own data

Getting started

Install the package, initialize a project, add your text, index it, then query. GraphRAG needs an OpenAI or Azure API key and can use significant LLM resources, so start with a small dataset.

Install GraphRAG

Install from PyPI. GraphRAG supports Python 3.10–3.12.

bashbash
python -m pip install graphrag

Initialize a project

Create the workspace files. This generates a .env file, a settings.yaml, and an input/ directory. Add your OpenAI or Azure key as GRAPHRAG_API_KEY in the .env file, and drop text files into input/.

bashbash
graphrag init --root ./ragtest

Index your documents

Run the indexing pipeline to build the knowledge graph. Results are written as Parquet files under the output directory. Indexing makes many LLM calls, so begin small.

bashbash
graphrag index --root ./ragtest

Query the graph

Ask a broad question over the whole corpus, or use --method local to focus on a specific entity and its relationships.

bashbash
graphrag query --root ./ragtest "What are the top themes in this story?"

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Summarize the main themes across a large private document collection where vector search alone misses cross-document links
  • Answer questions that require connecting facts spread over many separate files
  • Explore entities and their relationships in a corpus, such as people, organizations, and how they connect
  • Prototype a graph-based RAG approach over narrative or domain text before building a production system

How GraphRAG compares

GraphRAG alongside other open-source rag frameworks & platforms tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Dify★ 146kAn open-source platform with a visual workflow builder for creating LLM and RAG applications without writing much code.
RAGFlow★ 83.2kA RAG engine built around deep document understanding that turns complex files into a grounded, citation-backed question-answering layer.
Context7★ 57.7kContext7 pulls current, version-specific documentation and code examples for any library and feeds them into your LLM, available as a CLI skill or an MCP server.
Quivr★ 39.2kQuivr is an open-source RAG framework that ingests your documents and answers questions about them, working with any LLM and any file type.
LightRAG★ 36.8kA graph-based RAG system that builds an entity-and-relationship knowledge graph for fast retrieval and easy incremental updates.
GraphRAG★ 33.9kBuild a knowledge graph from your documents and query it with an LLM
PageIndex★ 33.2kPageIndex turns long PDFs into a table-of-contents tree and uses LLM reasoning to retrieve relevant sections, with no vector database and no chunking.
FastGPT★ 28.6kFastGPT is an open-source AI agent platform that pairs a built-in knowledge base with a drag-and-drop Flow editor, so you can build question-answering apps without heavy setup.