Overview
RAGFlow is an open-source Retrieval-Augmented Generation (RAG) engine from InfiniFlow. It focuses on deep document understanding, parsing messy real-world files such as Word documents, slides, spreadsheets, scanned images, web pages, and structured data into chunks that an LLM can answer questions over.
It is aimed at developers and teams who need to build a question-answering layer on top of their own documents. You configure your own LLMs and embedding models, and RAGFlow handles the chunking, retrieval, re-ranking, and citation tracking so answers stay grounded in the source material.
As a RAG framework, it sits between your raw documents and your application. Compared with writing a retrieval pipeline by hand, it provides template-based chunking, visualized text splitting for human review, and traceable citations to reduce hallucinations.
What it does
- Deep document understanding that extracts knowledge from unstructured files with complex formats
- Template-based chunking with multiple template options and visualized splitting for human review
- Grounded answers with traceable citations and quick access to key references
- Works with heterogeneous sources: Word, slides, Excel, text, images, scanned copies, structured data, and web pages
- Configurable LLMs and embedding models, with multiple recall paths and fused re-ranking
- APIs for integrating retrieval and answering into your own applications
Getting started
RAGFlow is self-hosted with Docker Compose. You need Docker >= 24.0.0, Docker Compose >= v2.26.1, and a host with at least 4 CPU cores, 16 GB RAM, and 50 GB disk.
Set vm.max_map_count
RAGFlow's vector store needs the kernel map count raised to at least 262144. Check and set it before starting (this resets on reboot).
sudo sysctl -w vm.max_map_count=262144Clone the repository
Get the source, which includes the Docker Compose files under the docker/ directory.
git clone https://github.com/infiniflow/ragflow.gitStart the server
Bring up the stack with Docker Compose. The containers include the RAGFlow server and its dependencies.
cd ragflow/docker
docker compose -f docker-compose.yml up -dOpen the web UI
Once the server is up, open a browser at the IP address of your machine (port 80 by default) and log in to RAGFlow to upload documents and start asking questions.
http://IP_OF_YOUR_MACHINECommands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Build a question-answering assistant over internal company documents and knowledge bases
- Add citation-backed answers to a product so users can trace responses to source passages
- Turn scanned PDFs, contracts, or reports into a searchable, grounded retrieval layer
- Prototype and self-host a RAG pipeline with your own choice of LLM and embedding models
How RAGFlow compares
RAGFlow alongside other open-source rag frameworks & platforms tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Dify | ★ 146k | An open-source platform with a visual workflow builder for creating LLM and RAG applications without writing much code. |
| RAGFlow | ★ 83.2k | Open-source RAG engine with deep document understanding and grounded, traceable citations |
| Context7 | ★ 57.7k | Context7 pulls current, version-specific documentation and code examples for any library and feeds them into your LLM, available as a CLI skill or an MCP server. |
| Quivr | ★ 39.2k | Quivr is an open-source RAG framework that ingests your documents and answers questions about them, working with any LLM and any file type. |
| LightRAG | ★ 36.8k | A graph-based RAG system that builds an entity-and-relationship knowledge graph for fast retrieval and easy incremental updates. |
| GraphRAG | ★ 33.9k | Microsoft's graph-based RAG system that extracts a knowledge graph from documents to answer broad, multi-document questions. |
| PageIndex | ★ 33.2k | PageIndex turns long PDFs into a table-of-contents tree and uses LLM reasoning to retrieve relevant sections, with no vector database and no chunking. |
| FastGPT | ★ 28.6k | FastGPT is an open-source AI agent platform that pairs a built-in knowledge base with a drag-and-drop Flow editor, so you can build question-answering apps without heavy setup. |