RAG-Anything

All-in-one multimodal RAG that retrieves over text, images, tables, and equations

github.com/HKUDS/RAG-Anything★ 21.4k arxiv.org/abs/2510.12323

Overview

RAG-Anything is an open-source Python framework for building retrieval-augmented generation (RAG) pipelines over documents that mix text with images, tables, charts, and mathematical equations. It is built on top of LightRAG and parses each document into a cross-modal knowledge graph, so a single query can pull together evidence from different content types instead of just plain text.

It is aimed at developers who work with rich, mixed-content sources such as academic papers, technical manuals, financial reports, and enterprise knowledge bases. Instead of stitching together separate tools for OCR, table extraction, and text retrieval, you ingest a file with one pipeline and query it through one interface.

Within the RAG frameworks category, it focuses on multimodal ingestion and retrieval. It uses a parser such as MinerU to break documents apart, then routes images, tables, and equations through dedicated processors before indexing them alongside the text.

What it does

End-to-end pipeline that goes from document ingestion and parsing to multimodal query answering
Handles PDFs, Office documents, images, and other common file formats
Dedicated processors for images, tables, and mathematical equations
Builds a multimodal knowledge graph with automatic entity extraction and cross-modal relationships
VLM-Enhanced Query mode feeds document images into a vision model for combined visual and text context
Flexible processing: MinerU-based parsing or direct injection of pre-parsed content lists

Getting started

Install the package from PyPI, set up an LLM API key, then process a document and query it.

Install from PyPI

Install the core package, or add the [all] extra to pull in optional features. Python 3.10+ is required.

bashbash

pip install raganything
pip install 'raganything[all]'  # With all optional features

Configure environment

Provide an OpenAI API key and parser settings in a .env file. LibreOffice is needed to process Office documents.

bashbash

OPENAI_API_KEY=your_openai_api_key
OPENAI_BASE_URL=your_base_url  # Optional
PARSER=mineru
PARSE_METHOD=auto

Process a document and query it

Configure RAGAnything with LLM and embedding functions, ingest a file, then run a query.

pythonpython

import asyncio
from functools import partial
from raganything import RAGAnything, RAGAnythingConfig
from lightrag.llm.openai import openai_complete_if_cache, openai_embed
from lightrag.utils import EmbeddingFunc

async def main():
    config = RAGAnythingConfig(
        working_dir="./rag_storage",
        parser="mineru",
        enable_image_processing=True,
        enable_table_processing=True,
    )

    def llm_model_func(prompt, system_prompt=None, history_messages=[], **kwargs):
        return openai_complete_if_cache(
            "gpt-4o-mini",
            prompt,
            system_prompt=system_prompt,
            history_messages=history_messages,
            api_key="your-api-key",
            **kwargs,
        )

    embedding_func = EmbeddingFunc(
        embedding_dim=3072,
        max_token_size=8192,
        func=partial(
            openai_embed.func,
            model="text-embedding-3-large",
            api_key="your-api-key",
        ),
    )

    rag = RAGAnything(
        config=config,
        llm_model_func=llm_model_func,
        embedding_func=embedding_func,
    )

    await rag.process_document_complete(
        file_path="path/to/document.pdf",
        output_dir="./output"
    )

    result = await rag.aquery(
        "What are the main findings?",
        mode="hybrid"
    )
    print(result)

asyncio.run(main())

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Querying academic papers where the answer depends on a figure, equation, or results table, not just the prose
Building a knowledge base over technical manuals and product docs that mix diagrams with text
Extracting and asking questions across financial reports that contain structured tables
Powering enterprise search over mixed-content document collections through one ingestion pipeline

How RAG-Anything compares

RAG-Anything alongside other open-source rag frameworks & platforms tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Dify	★ 146k	An open-source platform with a visual workflow builder for creating LLM and RAG applications without writing much code.
RAGFlow	★ 83.2k	A RAG engine built around deep document understanding that turns complex files into a grounded, citation-backed question-answering layer.
Context7	★ 57.7k	Context7 pulls current, version-specific documentation and code examples for any library and feeds them into your LLM, available as a CLI skill or an MCP server.
Quivr	★ 39.2k	Quivr is an open-source RAG framework that ingests your documents and answers questions about them, working with any LLM and any file type.
LightRAG	★ 36.8k	A graph-based RAG system that builds an entity-and-relationship knowledge graph for fast retrieval and easy incremental updates.
GraphRAG	★ 33.9k	Microsoft's graph-based RAG system that extracts a knowledge graph from documents to answer broad, multi-document questions.
PageIndex	★ 33.2k	PageIndex turns long PDFs into a table-of-contents tree and uses LLM reasoning to retrieve relevant sections, with no vector database and no chunking.
RAG-Anything	★ 21.4k	All-in-one multimodal RAG that retrieves over text, images, tables, and equations

// Overview

// What it does

// Getting started