NVIDIA · 2026-04-17 · notable
NVIDIA Nemotron OCR v2 — Unified Multilingual OCR, 28x Faster Than PaddleOCR
NVIDIA releases Nemotron OCR v2: a unified multilingual OCR model handling English, Chinese, Japanese, Korean, and Russian simultaneously at 34.7 pages/sec on a single A100 — 28x faster than PaddleOCR v5, no language switching needed.

A single unified OCR model from NVIDIA that reads six languages at 34.7 pages per second — no per-language models needed.
Key specs
| Inference speed (a100) | 34.7 pages/sec |
|---|---|
| Speed vs paddle ocr v5 | 28x faster |
| Synthetic training images | 12.26M |
| V2 english parameters | 54M |
| V2 multilingual parameters | 84M |
| Languages supported | 6 (EN, ZH-S, ZH-T, JA, KO, RU) |
What is it?
Nemotron OCR v2 is NVIDIA's open-weights document OCR system, released April 17 on HuggingFace. It detects text regions, transcribes them, and reconstructs reading order — all in one pass. Unlike legacy OCR pipelines that require a separate model per language, a single Nemotron OCR v2 checkpoint handles English, Chinese (Simplified and Traditional), Japanese, Korean, and Russian simultaneously. Two variants are available: v2_english (54M parameters, word-level output) and v2_multilingual (84M parameters, line-level for mixed-language documents).
How does it work?
The architecture uses a shared RegNetX-8GF backbone (FOTS-based) whose computed features are reused by three components without reprocessing the image: a text detector (bounding boxes), a Transformer-based recognizer (transcription), and a relational model (reading order and layout structure). This shared-backbone design is the primary source of the 28x speed advantage over PaddleOCR v5. Training used 12.26M synthetic images generated from the mOSCAR multilingual corpus via a modified SynthDoG renderer. Available via pip install and a free interactive HuggingFace Space.
Why does it matter?
Production OCR pipelines typically require separate models per language, adding operational complexity and latency. A single unified checkpoint removes that overhead. At 34.7 pages/sec on an A100, real-time multilingual document processing is practical. The model ships under a permissive license with the full 12.26M-image synthetic training dataset (CC-BY-4.0) also released, which is useful for fine-tuning on domain-specific documents.
Who is it for?
Document processing engineers and ML practitioners handling multilingual text at scale.
Try it
pip install nemotron-ocr # then: from nemotron_ocr.inference.pipeline_v2 import NemotronOCRV2