Overview
RAG-Anything is an open-source Python framework for building retrieval-augmented generation (RAG) pipelines over documents that mix text with images, tables, charts, and mathematical equations. It is built on top of LightRAG and parses each document into a cross-modal knowledge graph, so a single query can pull together evidence from different content types instead of just plain text.
It is aimed at developers who work with rich, mixed-content sources such as academic papers, technical manuals, financial reports, and enterprise knowledge bases. Instead of stitching together separate tools for OCR, table extraction, and text retrieval, you ingest a file with one pipeline and query it through one interface.
Within the RAG frameworks category, it focuses on multimodal ingestion and retrieval. It uses a parser such as MinerU to break documents apart, then routes images, tables, and equations through dedicated processors before indexing them alongside the text.
What it does
- End-to-end pipeline that goes from document ingestion and parsing to multimodal query answering
- Handles PDFs, Office documents, images, and other common file formats
- Dedicated processors for images, tables, and mathematical equations
- Builds a multimodal knowledge graph with automatic entity extraction and cross-modal relationships
- VLM-Enhanced Query mode feeds document images into a vision model for combined visual and text context
- Flexible processing: MinerU-based parsing or direct injection of pre-parsed content lists
Getting started
Install the package from PyPI, set up an LLM API key, then process a document and query it.
Install from PyPI
Install the core package, or add the [all] extra to pull in optional features. Python 3.10+ is required.
pip install raganything
pip install 'raganything[all]' # With all optional featuresConfigure environment
Provide an OpenAI API key and parser settings in a .env file. LibreOffice is needed to process Office documents.
OPENAI_API_KEY=your_openai_api_key
OPENAI_BASE_URL=your_base_url # Optional
PARSER=mineru
PARSE_METHOD=autoProcess a document and query it
Configure RAGAnything with LLM and embedding functions, ingest a file, then run a query.
import asyncio
from functools import partial
from raganything import RAGAnything, RAGAnythingConfig
from lightrag.llm.openai import openai_complete_if_cache, openai_embed
from lightrag.utils import EmbeddingFunc
async def main():
config = RAGAnythingConfig(
working_dir="./rag_storage",
parser="mineru",
enable_image_processing=True,
enable_table_processing=True,
)
def llm_model_func(prompt, system_prompt=None, history_messages=[], **kwargs):
return openai_complete_if_cache(
"gpt-4o-mini",
prompt,
system_prompt=system_prompt,
history_messages=history_messages,
api_key="your-api-key",
**kwargs,
)
embedding_func = EmbeddingFunc(
embedding_dim=3072,
max_token_size=8192,
func=partial(
openai_embed.func,
model="text-embedding-3-large",
api_key="your-api-key",
),
)
rag = RAGAnything(
config=config,
llm_model_func=llm_model_func,
embedding_func=embedding_func,
)
await rag.process_document_complete(
file_path="path/to/document.pdf",
output_dir="./output"
)
result = await rag.aquery(
"What are the main findings?",
mode="hybrid"
)
print(result)
asyncio.run(main())Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Querying academic papers where the answer depends on a figure, equation, or results table, not just the prose
- Building a knowledge base over technical manuals and product docs that mix diagrams with text
- Extracting and asking questions across financial reports that contain structured tables
- Powering enterprise search over mixed-content document collections through one ingestion pipeline
How RAG-Anything compares
RAG-Anything alongside other open-source rag frameworks & platforms tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Dify | ★ 146k | An open-source platform with a visual workflow builder for creating LLM and RAG applications without writing much code. |
| RAGFlow | ★ 83.2k | A RAG engine built around deep document understanding that turns complex files into a grounded, citation-backed question-answering layer. |
| Context7 | ★ 57.7k | Context7 pulls current, version-specific documentation and code examples for any library and feeds them into your LLM, available as a CLI skill or an MCP server. |
| Quivr | ★ 39.2k | Quivr is an open-source RAG framework that ingests your documents and answers questions about them, working with any LLM and any file type. |
| LightRAG | ★ 36.8k | A graph-based RAG system that builds an entity-and-relationship knowledge graph for fast retrieval and easy incremental updates. |
| GraphRAG | ★ 33.9k | Microsoft's graph-based RAG system that extracts a knowledge graph from documents to answer broad, multi-document questions. |
| PageIndex | ★ 33.2k | PageIndex turns long PDFs into a table-of-contents tree and uses LLM reasoning to retrieve relevant sections, with no vector database and no chunking. |
| RAG-Anything | ★ 21.4k | All-in-one multimodal RAG that retrieves over text, images, tables, and equations |