AI/TLDR

RAG-Anything

All-in-one multimodal RAG that retrieves over text, images, tables, and equations

Overview

RAG-Anything is an open-source Python framework for building retrieval-augmented generation (RAG) pipelines over documents that mix text with images, tables, charts, and mathematical equations. It is built on top of LightRAG and parses each document into a cross-modal knowledge graph, so a single query can pull together evidence from different content types instead of just plain text.

It is aimed at developers who work with rich, mixed-content sources such as academic papers, technical manuals, financial reports, and enterprise knowledge bases. Instead of stitching together separate tools for OCR, table extraction, and text retrieval, you ingest a file with one pipeline and query it through one interface.

Within the RAG frameworks category, it focuses on multimodal ingestion and retrieval. It uses a parser such as MinerU to break documents apart, then routes images, tables, and equations through dedicated processors before indexing them alongside the text.

What it does

  • End-to-end pipeline that goes from document ingestion and parsing to multimodal query answering
  • Handles PDFs, Office documents, images, and other common file formats
  • Dedicated processors for images, tables, and mathematical equations
  • Builds a multimodal knowledge graph with automatic entity extraction and cross-modal relationships
  • VLM-Enhanced Query mode feeds document images into a vision model for combined visual and text context
  • Flexible processing: MinerU-based parsing or direct injection of pre-parsed content lists

Getting started

Install the package from PyPI, set up an LLM API key, then process a document and query it.

Install from PyPI

Install the core package, or add the [all] extra to pull in optional features. Python 3.10+ is required.

bashbash
pip install raganything
pip install 'raganything[all]'  # With all optional features

Configure environment

Provide an OpenAI API key and parser settings in a .env file. LibreOffice is needed to process Office documents.

bashbash
OPENAI_API_KEY=your_openai_api_key
OPENAI_BASE_URL=your_base_url  # Optional
PARSER=mineru
PARSE_METHOD=auto

Process a document and query it

Configure RAGAnything with LLM and embedding functions, ingest a file, then run a query.

pythonpython
import asyncio
from functools import partial
from raganything import RAGAnything, RAGAnythingConfig
from lightrag.llm.openai import openai_complete_if_cache, openai_embed
from lightrag.utils import EmbeddingFunc

async def main():
    config = RAGAnythingConfig(
        working_dir="./rag_storage",
        parser="mineru",
        enable_image_processing=True,
        enable_table_processing=True,
    )

    def llm_model_func(prompt, system_prompt=None, history_messages=[], **kwargs):
        return openai_complete_if_cache(
            "gpt-4o-mini",
            prompt,
            system_prompt=system_prompt,
            history_messages=history_messages,
            api_key="your-api-key",
            **kwargs,
        )

    embedding_func = EmbeddingFunc(
        embedding_dim=3072,
        max_token_size=8192,
        func=partial(
            openai_embed.func,
            model="text-embedding-3-large",
            api_key="your-api-key",
        ),
    )

    rag = RAGAnything(
        config=config,
        llm_model_func=llm_model_func,
        embedding_func=embedding_func,
    )

    await rag.process_document_complete(
        file_path="path/to/document.pdf",
        output_dir="./output"
    )

    result = await rag.aquery(
        "What are the main findings?",
        mode="hybrid"
    )
    print(result)

asyncio.run(main())

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Querying academic papers where the answer depends on a figure, equation, or results table, not just the prose
  • Building a knowledge base over technical manuals and product docs that mix diagrams with text
  • Extracting and asking questions across financial reports that contain structured tables
  • Powering enterprise search over mixed-content document collections through one ingestion pipeline

How RAG-Anything compares

RAG-Anything alongside other open-source rag frameworks & platforms tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Dify★ 146kAn open-source platform with a visual workflow builder for creating LLM and RAG applications without writing much code.
RAGFlow★ 83.2kA RAG engine built around deep document understanding that turns complex files into a grounded, citation-backed question-answering layer.
Context7★ 57.7kContext7 pulls current, version-specific documentation and code examples for any library and feeds them into your LLM, available as a CLI skill or an MCP server.
Quivr★ 39.2kQuivr is an open-source RAG framework that ingests your documents and answers questions about them, working with any LLM and any file type.
LightRAG★ 36.8kA graph-based RAG system that builds an entity-and-relationship knowledge graph for fast retrieval and easy incremental updates.
GraphRAG★ 33.9kMicrosoft's graph-based RAG system that extracts a knowledge graph from documents to answer broad, multi-document questions.
PageIndex★ 33.2kPageIndex turns long PDFs into a table-of-contents tree and uses LLM reasoning to retrieve relevant sections, with no vector database and no chunking.
RAG-Anything★ 21.4kAll-in-one multimodal RAG that retrieves over text, images, tables, and equations