AI/TLDR

HKUDS · 2026-03-24 · major

RAG-Anything — All-in-One Multi-Modal RAG Framework for Text, Images, Tables, and Equations

RAG-Anything extends LightRAG to handle rich mixed-content documents — images, tables, equations, and charts alongside text. Uses dual-graph construction and cross-modal hybrid retrieval. 17.6k GitHub stars, trending today.

RAG-Anything GitHub repository — multimodal RAG framework for text, images, tables, and equations

RAG for real-world documents — handles images, tables, equations, and charts alongside text in a single pipeline.

What is it?

RAG-Anything is an open-source framework that extends LightRAG to work with multimodal documents — PDFs, reports, scientific papers — that mix text with images, tables, equations, and charts. Instead of discarding non-text content, it processes each element through specialized parsers and builds a cross-modal knowledge graph for retrieval. Version 1.2.10 added a custom parser plugin system and processing event callbacks.

How does it work?

Documents are parsed by MinerU, Docling, or PaddleOCR into typed elements. Image regions go through a VLM for captioning; tables are structured into relational entries; equations are converted to LaTeX. All elements are merged into a dual-graph: one capturing cross-modal entity relationships, one for textual semantics. Queries hit both graphs and results are fused before ranking.

Why does it matter?

Most production RAG systems lose a document's non-text content by converting to plain text and discarding tables and figures. RAG-Anything keeps that information in the graph, giving substantially better answers on financial reports, technical manuals, and research papers.

Who is it for?

ML engineers and backend developers building RAG over rich, mixed-content document corpora

Sources · 2 outlets

Tags

  • rag
  • multi-modal
  • retrieval-augmented-generation
  • knowledge-graph
  • lightrag
  • python
  • open-source
  • documents

← All releases · Learn AI