Overview
PaddleOCR is an open-source toolkit for optical character recognition (OCR) and document parsing. It detects where text appears in an image and reads it back as machine-readable text, with support for more than 100 languages. Beyond plain text, it can turn PDFs and scanned images into structured outputs like JSON and Markdown.
It is aimed at developers who need to extract text or document structure from images at scale: think pipelines that feed scanned documents into search, retrieval-augmented generation (RAG), or LLM agents. The project ships ready-to-use models such as the PP-OCRv6 recognition models and the PP-StructureV3 document pipeline, so you do not have to train anything to get started.
As a computer-vision tool, PaddleOCR covers both general scene text spotting (IDs, street views, books, industrial parts) and layout-aware document conversion. It runs on Linux, Windows, and macOS, and supports CPU and GPU backends.
What it does
- Recognizes text in 100+ languages, with PP-OCRv6 covering 50 languages (Chinese, English, Japanese, and 46 Latin-script languages) in a single unified model
- Converts complex PDFs and images into structured Markdown or JSON using the PP-StructureV3 pipeline, including table and text coordinates
- Includes the PaddleOCR-VL document vision-language model for parsing tables, formulas, and other document elements
- Handles natural scene text such as IDs, street views, books, and industrial components
- Offers PP-OCRv6 in three size tiers (tiny, small, medium) for edge, mobile, and server deployment
- Runs on Linux, Windows, and macOS across CPU, GPU, and other AI accelerators
Getting started
Install PaddleOCR and an inference engine, then run text detection and recognition on an image with a few lines of Python.
Install PaddleOCR
Install the library with all optional dependencies using pip.
python -m pip install "paddleocr[all]"Install the inference engine
PaddleOCR needs an inference backend. For CPU, install PaddlePaddle from the official index.
python -m pip install paddlepaddle==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/Run OCR on an image
Create a PaddleOCR pipeline and call predict on your image file, then print the recognized text.
from paddleocr import PaddleOCR
ocr = PaddleOCR(
use_doc_orientation_classify=False,
use_doc_unwarping=False,
use_textline_orientation=False,
engine="paddle"
)
result = ocr.predict("./image.png")
for res in result:
res.print()Or use the command line
You can run the same OCR pipeline directly from the terminal without writing code.
paddleocr ocr -i ./image.png \
--use_doc_orientation_classify False \
--use_doc_unwarping False \
--use_textline_orientation False \
--engine paddleCommands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Extracting text from scanned documents, receipts, or IDs to feed into search or data pipelines
- Converting PDFs and images into Markdown or JSON for retrieval-augmented generation (RAG) and LLM agents
- Reading multilingual text from photos and scans without switching models
- Recognizing text in real-world scenes such as street views, books, or industrial components
How PaddleOCR compares
PaddleOCR alongside other open-source vision & understanding tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| PaddleOCR | ★ 83.1k | Detect and recognize text in images and parse documents across 100+ languages |
| Ultralytics YOLO | ★ 58.6k | A framework for training and running YOLO models for real-time object detection, segmentation, and tracking. |
| Supervision | ★ 44.7k | A Python toolkit for processing, annotating, and visualizing detections and segmentations from many vision models. |
| MMDetection | ★ 32.8k | An OpenMMLab toolbox with many object detection and instance segmentation algorithms for research and production. |
| Segment Anything 2 (SAM 2) | ★ 19.4k | Meta's model for segmenting and tracking any object across images and video frames from clicks or boxes. |
| Grounded-SAM | ★ 17.6k | A pipeline that combines Grounding DINO and Segment Anything to detect and segment objects from text prompts. |
| DINOv3 | ★ 10.7k | Meta's self-supervised vision backbone that produces general-purpose image features for many downstream tasks. |
| Segment Anything 3 (SAM 3) | ★ 10.6k | Meta's segmentation model that detects, segments, and tracks objects in images and video from text or visual prompts. |