PaddleOCR

Detect and recognize text in images and parse documents across 100+ languages

github.com/PaddlePaddle/PaddleOCR★ 83.1k paddlepaddle.github.io/PaddleOCR

Overview

PaddleOCR is an open-source toolkit for optical character recognition (OCR) and document parsing. It detects where text appears in an image and reads it back as machine-readable text, with support for more than 100 languages. Beyond plain text, it can turn PDFs and scanned images into structured outputs like JSON and Markdown.

It is aimed at developers who need to extract text or document structure from images at scale: think pipelines that feed scanned documents into search, retrieval-augmented generation (RAG), or LLM agents. The project ships ready-to-use models such as the PP-OCRv6 recognition models and the PP-StructureV3 document pipeline, so you do not have to train anything to get started.

As a computer-vision tool, PaddleOCR covers both general scene text spotting (IDs, street views, books, industrial parts) and layout-aware document conversion. It runs on Linux, Windows, and macOS, and supports CPU and GPU backends.

What it does

Recognizes text in 100+ languages, with PP-OCRv6 covering 50 languages (Chinese, English, Japanese, and 46 Latin-script languages) in a single unified model
Converts complex PDFs and images into structured Markdown or JSON using the PP-StructureV3 pipeline, including table and text coordinates
Includes the PaddleOCR-VL document vision-language model for parsing tables, formulas, and other document elements
Handles natural scene text such as IDs, street views, books, and industrial components
Offers PP-OCRv6 in three size tiers (tiny, small, medium) for edge, mobile, and server deployment
Runs on Linux, Windows, and macOS across CPU, GPU, and other AI accelerators

Getting started

Install PaddleOCR and an inference engine, then run text detection and recognition on an image with a few lines of Python.

Install PaddleOCR

Install the library with all optional dependencies using pip.

bashbash

python -m pip install "paddleocr[all]"

Install the inference engine

PaddleOCR needs an inference backend. For CPU, install PaddlePaddle from the official index.

bashbash

python -m pip install paddlepaddle==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

Run OCR on an image

Create a PaddleOCR pipeline and call predict on your image file, then print the recognized text.

pythonpython

from paddleocr import PaddleOCR

ocr = PaddleOCR(
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=False,
    engine="paddle"
)

result = ocr.predict("./image.png")
for res in result:
    res.print()

Or use the command line

You can run the same OCR pipeline directly from the terminal without writing code.

bashbash

paddleocr ocr -i ./image.png \
    --use_doc_orientation_classify False \
    --use_doc_unwarping False \
    --use_textline_orientation False \
    --engine paddle

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Extracting text from scanned documents, receipts, or IDs to feed into search or data pipelines
Converting PDFs and images into Markdown or JSON for retrieval-augmented generation (RAG) and LLM agents
Reading multilingual text from photos and scans without switching models
Recognizing text in real-world scenes such as street views, books, or industrial components

How PaddleOCR compares

PaddleOCR alongside other open-source vision & understanding tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
PaddleOCR	★ 83.1k	Detect and recognize text in images and parse documents across 100+ languages
Ultralytics YOLO	★ 58.6k	A framework for training and running YOLO models for real-time object detection, segmentation, and tracking.
Supervision	★ 44.7k	A Python toolkit for processing, annotating, and visualizing detections and segmentations from many vision models.
MMDetection	★ 32.8k	An OpenMMLab toolbox with many object detection and instance segmentation algorithms for research and production.
Segment Anything 2 (SAM 2)	★ 19.4k	Meta's model for segmenting and tracking any object across images and video frames from clicks or boxes.
Grounded-SAM	★ 17.6k	A pipeline that combines Grounding DINO and Segment Anything to detect and segment objects from text prompts.
DINOv3	★ 10.7k	Meta's self-supervised vision backbone that produces general-purpose image features for many downstream tasks.
Segment Anything 3 (SAM 3)	★ 10.6k	Meta's segmentation model that detects, segments, and tracks objects in images and video from text or visual prompts.

// Overview

// What it does

// Getting started

Install PaddleOCR

Install the inference engine

Run OCR on an image

Or use the command line

// When to use it

// How PaddleOCR compares

Overview

What it does

Getting started

When to use it

How PaddleOCR compares