VectifyAI · 2026-05-03 · major

PageIndex File System — Vectorless Reasoning RAG Scales to Millions of Documents

New File System layer extends PageIndex's vectorless RAG to million-document corpora using query-time virtual trees, on-demand hierarchies conditioned on the query, and adaptive traversal. OSS library at 28k stars.

PageIndex vectorless reasoning RAG system

PageIndex File System adds a query-time virtual tree that makes vectorless reasoning RAG work across millions of documents.

Key specs

GitHub stars	28,662

What is it?

PageIndex is an MIT-licensed open-source RAG library (28k stars) from VectifyAI that replaces vector similarity search with hierarchical document trees and LLM reasoning. On May 3, 2026, VectifyAI shipped the PageIndex File System — a new layer that extends this approach from single documents to million-document corpora.

How does it work?

The File System builds virtual tree nodes by clustering documents into topic-based groups and adding LLM-inferred metadata. Crucially, the hierarchy is built on demand at query time, conditioned on the query itself — different queries produce different organizational views of the same corpus. The system adaptively switches between layer-wise traversal (when folder labels are meaningful) and recursive flattening (when labels lack information), bypassing uninformative structural levels. This enables the same reasoning-based retrieval that achieves 98.7% on FinanceBench to scale to enterprise document repositories.

Why does it matter?

Flat document storage breaks at scale because there is no structure to reason over. The File System adds that missing semantic hierarchy dynamically, keeping accuracy high as corpus size grows — a critical gap for enterprise document QA and knowledge management.

Who is it for?

Enterprises building document QA, compliance search, or knowledge management systems at scale who need high accuracy on complex multi-document queries.

Try it

pip install pageindex