AI/TLDR

LightRAG

Graph-based RAG that turns your documents into a queryable knowledge graph

Overview

LightRAG is an open-source Python framework for retrieval-augmented generation (RAG) from HKUDS. Instead of only chunking text and matching embeddings, it extracts entities and the relationships between them to build a knowledge graph, then uses that graph alongside vector search when answering a query. The package is published on PyPI as lightrag-hku.

It is aimed at developers who are building question-answering or assistant features over their own documents and want retrieval that can reason across entities and documents, not just return the nearest text chunks. Because the graph supports incremental updates and document deletion with automatic regeneration, you can keep a corpus current without rebuilding everything from scratch.

Within the RAG framework space, LightRAG sits next to tools like vector-only pipelines and other graph-RAG systems. It offers several query modes (naive, local, global, hybrid, and mix), pluggable storage backends such as PostgreSQL, MongoDB, Neo4j, and OpenSearch, a built-in WebUI, and an optional API server, so you can start small and grow into a larger deployment.

What it does

  • Builds an entity-and-relationship knowledge graph from your documents to support retrieval that reasons across entities and sources
  • Five query modes — naive, local, global, hybrid, and mix — so you can trade off precise local matching against broad cross-document reasoning
  • Incremental updates and document deletion with automatic knowledge-graph regeneration to keep retrieval accurate over time
  • Pluggable storage backends including PostgreSQL, MongoDB, Neo4j, and OpenSearch for graph, vector, and key-value data
  • Works with OpenAI and open-source LLMs (for example Qwen3), plus reranker support to improve mixed queries
  • Ships a WebUI for inserting, querying, and visualizing the graph, and an optional API server via the [api] extra

Getting started

Install the package from PyPI, set your model API key, then initialize storage, insert documents, and query. LightRAG's official entry point is the OpenAI demo script.

Install LightRAG

Install the core package from PyPI. Add the [api] extra if you want the API server and WebUI.

bashbash
pip install lightrag-hku
# with API server + WebUI extras:
pip install "lightrag-hku[api]"

Set your API key and get sample data

LightRAG's default demo uses OpenAI models, so export your key. You can grab a sample text file to try it out.

bashbash
export OPENAI_API_KEY="sk-...your_openai_key..."
curl https://raw.githubusercontent.com/gusye1234/nano-graphrag/main/tests/mock_data.txt > ./book.txt

Initialize, insert, and query

Create a LightRAG instance, initialize its storages and pipeline status, insert text, then query with a chosen mode (mix is the default). LightRAG is async, so run the calls inside an event loop. Refer to examples/lightrag_openai_demo.py in the repo for the full, runnable script.

pythonpython
import asyncio
from lightrag import LightRAG, QueryParam
from lightrag.kg.shared_storage import initialize_pipeline_status

async def main():
    rag = LightRAG(working_dir="./rag_storage")
    await rag.initialize_storages()
    await initialize_pipeline_status()

    with open("./book.txt") as f:
        await rag.ainsert(f.read())

    answer = await rag.aquery(
        "What are the top themes in this story?",
        param=QueryParam(mode="mix"),
    )
    print(answer)

asyncio.run(main())

Run the official demo instead

Rather than hand-wiring the LLM and embedding functions, you can run the maintained example script, which sets everything up for OpenAI.

bashbash
python examples/lightrag_openai_demo.py

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Build a question-answering assistant over an internal document set that needs to reason across entities, not just match the nearest text chunk
  • Keep a knowledge base current with incremental document inserts and deletions without rebuilding the whole index
  • Answer broad, cross-document questions using global or mix query modes where simple vector retrieval falls short
  • Stand up a graph-backed RAG service using PostgreSQL, MongoDB, Neo4j, or OpenSearch and the optional API server

How LightRAG compares

LightRAG alongside other open-source rag frameworks & platforms tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Dify★ 146kAn open-source platform with a visual workflow builder for creating LLM and RAG applications without writing much code.
RAGFlow★ 83.2kA RAG engine built around deep document understanding that turns complex files into a grounded, citation-backed question-answering layer.
Context7★ 57.7kContext7 pulls current, version-specific documentation and code examples for any library and feeds them into your LLM, available as a CLI skill or an MCP server.
Quivr★ 39.2kQuivr is an open-source RAG framework that ingests your documents and answers questions about them, working with any LLM and any file type.
LightRAG★ 36.8kGraph-based RAG that turns your documents into a queryable knowledge graph
GraphRAG★ 33.9kMicrosoft's graph-based RAG system that extracts a knowledge graph from documents to answer broad, multi-document questions.
PageIndex★ 33.2kPageIndex turns long PDFs into a table-of-contents tree and uses LLM reasoning to retrieve relevant sections, with no vector database and no chunking.
FastGPT★ 28.6kFastGPT is an open-source AI agent platform that pairs a built-in knowledge base with a drag-and-drop Flow editor, so you can build question-answering apps without heavy setup.