Zerox

OCR any document by passing page images through a vision model to get Markdown

github.com/getomni-ai/zerox★ 12.2k getomni.ai/ocr-demo

Overview

Zerox is an OCR tool that turns documents into Markdown for AI ingestion. Instead of relying on traditional text-extraction rules, it converts each page of a file (PDF, DOCX, image, and more) into an image, sends that image to a vision model, asks for Markdown, and aggregates the per-page responses into one result.

It is aimed at developers who need to feed documents with awkward layouts, tables, and charts into an LLM pipeline. Because vision models read a page the way a person would, Zerox tends to hold up on content that line-based parsers struggle with.

Within the RAG and retrieval space, Zerox fits the document parsing and ingestion step: it produces the clean Markdown you then chunk, embed, and index. It ships as both a Node package (zerox on npm) and a Python package (py-zerox on PyPI), and supports OpenAI, Azure OpenAI, AWS Bedrock, and Google Gemini, with Vertex AI on the Python side.

What it does

Converts PDF, DOCX, and image files into Markdown by sending page images to a vision model
Works with multiple providers: OpenAI, Azure OpenAI, AWS Bedrock, and Google Gemini (plus Vertex AI in Python)
maintainFormat option carries a prior page's output into the next request to keep tables and formatting consistent across pages
Concurrent page processing via the concurrency option, with configurable retries and error modes
Structured data extraction with a schema, including per-page extraction (Node)
Image controls like orientation correction, edge trimming, DPI/imageDensity, and page selection

Getting started

Zerox is available as both a Node and a Python package. The Node quickstart below OCRs a PDF and returns Markdown.

Install the package

Install Zerox from npm. It uses graphicsmagick and ghostscript for the PDF-to-image step; these are usually pulled in automatically, but you may need to install graphicsmagick yourself (for example, apt-get install graphicsmagick on Linux).

bashbash

npm install zerox

OCR a document to Markdown

Call zerox with a file path (local path or URL) and your provider credentials. The result contains the aggregated Markdown plus per-page content and token usage.

tsts

import { zerox } from "zerox";

const result = await zerox({
  filePath: "https://omni-demo-data.s3.amazonaws.com/test/cs101.pdf",
  credentials: {
    apiKey: process.env.OPENAI_API_KEY,
  },
});

Tune for tables and multi-page formatting

If your documents have tables that cross pages, enable maintainFormat so each page is processed with the previous page's Markdown as context. This runs synchronously and is slower, but keeps formatting consistent.

tsts

const result = await zerox({
  filePath: "path/to/file.pdf",
  credentials: { apiKey: process.env.OPENAI_API_KEY },
  maintainFormat: true,
  concurrency: 10,
});

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Ingesting scanned or image-heavy PDFs into a RAG pipeline as clean Markdown for chunking and embedding
Extracting tables and structured data from invoices, receipts, or reports using a schema
Parsing documents with awkward layouts, charts, or multi-column text that line-based OCR mishandles
Batch-converting a mix of PDFs, DOCX files, and images into Markdown for an LLM workflow

How Zerox compares

Zerox alongside other open-source parsing & ingestion tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
MarkItDown	★ 156k	A Microsoft Python utility that converts many file types, including Office docs and PDFs, into Markdown for LLMs.
MinerU	★ 68.1k	A document extraction tool that converts PDFs and Office files into clean Markdown or JSON, with strong handling of complex layouts and CJK content.
Docling	★ 61.9k	An IBM-originated document conversion pipeline that turns PDF, DOCX, PPTX, HTML, and more into structured, LLM-ready Markdown or JSON.
Marker	★ 36.2k	A fast pipeline that converts PDFs and other documents to Markdown, JSON, or HTML while preserving tables, equations, and formatting.
Repomix	★ 26.4k	Repomix packs an entire repository into one file that is easy to feed to AI tools like Claude, ChatGPT, and Gemini.
OpenDataLoader PDF	★ 25.4k	OpenDataLoader PDF turns any PDF into structured Markdown, JSON, or HTML with bounding boxes, and auto-tags untagged files into screen-reader-ready Tagged PDFs.
Unstructured	★ 15k	A library for ingesting and preprocessing many document types into clean, chunked elements ready for RAG pipelines.
Zerox	★ 12.2k	OCR any document by passing page images through a vision model to get Markdown

// Overview

// What it does

// Getting started