AI/TLDR

PaddleOCR

Detect and recognize text in images and parse documents across 100+ languages

Overview

PaddleOCR is an open-source toolkit for optical character recognition (OCR) and document parsing. It detects where text appears in an image and reads it back as machine-readable text, with support for more than 100 languages. Beyond plain text, it can turn PDFs and scanned images into structured outputs like JSON and Markdown.

It is aimed at developers who need to extract text or document structure from images at scale: think pipelines that feed scanned documents into search, retrieval-augmented generation (RAG), or LLM agents. The project ships ready-to-use models such as the PP-OCRv6 recognition models and the PP-StructureV3 document pipeline, so you do not have to train anything to get started.

As a computer-vision tool, PaddleOCR covers both general scene text spotting (IDs, street views, books, industrial parts) and layout-aware document conversion. It runs on Linux, Windows, and macOS, and supports CPU and GPU backends.

What it does

  • Recognizes text in 100+ languages, with PP-OCRv6 covering 50 languages (Chinese, English, Japanese, and 46 Latin-script languages) in a single unified model
  • Converts complex PDFs and images into structured Markdown or JSON using the PP-StructureV3 pipeline, including table and text coordinates
  • Includes the PaddleOCR-VL document vision-language model for parsing tables, formulas, and other document elements
  • Handles natural scene text such as IDs, street views, books, and industrial components
  • Offers PP-OCRv6 in three size tiers (tiny, small, medium) for edge, mobile, and server deployment
  • Runs on Linux, Windows, and macOS across CPU, GPU, and other AI accelerators

Getting started

Install PaddleOCR and an inference engine, then run text detection and recognition on an image with a few lines of Python.

Install PaddleOCR

Install the library with all optional dependencies using pip.

bashbash
python -m pip install "paddleocr[all]"

Install the inference engine

PaddleOCR needs an inference backend. For CPU, install PaddlePaddle from the official index.

bashbash
python -m pip install paddlepaddle==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

Run OCR on an image

Create a PaddleOCR pipeline and call predict on your image file, then print the recognized text.

pythonpython
from paddleocr import PaddleOCR

ocr = PaddleOCR(
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=False,
    engine="paddle"
)

result = ocr.predict("./image.png")
for res in result:
    res.print()

Or use the command line

You can run the same OCR pipeline directly from the terminal without writing code.

bashbash
paddleocr ocr -i ./image.png \
    --use_doc_orientation_classify False \
    --use_doc_unwarping False \
    --use_textline_orientation False \
    --engine paddle

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Extracting text from scanned documents, receipts, or IDs to feed into search or data pipelines
  • Converting PDFs and images into Markdown or JSON for retrieval-augmented generation (RAG) and LLM agents
  • Reading multilingual text from photos and scans without switching models
  • Recognizing text in real-world scenes such as street views, books, or industrial components

How PaddleOCR compares

PaddleOCR alongside other open-source vision & understanding tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
PaddleOCR★ 83.1kDetect and recognize text in images and parse documents across 100+ languages
Ultralytics YOLO★ 58.6kA framework for training and running YOLO models for real-time object detection, segmentation, and tracking.
Supervision★ 44.7kA Python toolkit for processing, annotating, and visualizing detections and segmentations from many vision models.
MMDetection★ 32.8kAn OpenMMLab toolbox with many object detection and instance segmentation algorithms for research and production.
Segment Anything 2 (SAM 2)★ 19.4kMeta's model for segmenting and tracking any object across images and video frames from clicks or boxes.
Grounded-SAM★ 17.6kA pipeline that combines Grounding DINO and Segment Anything to detect and segment objects from text prompts.
DINOv3★ 10.7kMeta's self-supervised vision backbone that produces general-purpose image features for many downstream tasks.
Segment Anything 3 (SAM 3)★ 10.6kMeta's segmentation model that detects, segments, and tracks objects in images and video from text or visual prompts.