NVIDIA · 2026-04-17 · notable

NVIDIA Nemotron OCR v2 — Unified Multilingual OCR, 28x Faster Than PaddleOCR

Item: NVIDIA Nemotron OCR v2 — Unified Multilingual OCR, 28x Faster Than PaddleOCR
Rating: 3
Author: AI/TLDR

NVIDIA releases Nemotron OCR v2: a unified multilingual OCR model handling English, Chinese, Japanese, Korean, and Russian simultaneously at 34.7 pages/sec on a single A100 — 28x faster than PaddleOCR v5, no language switching needed.

NVIDIA Nemotron OCR v2 model card on Hugging Face — multilingual document OCR

A single unified OCR model from NVIDIA that reads six languages at 34.7 pages per second — no per-language models needed.

Key specs

Inference speed (a100)	34.7 pages/sec
Speed vs paddle ocr v5	28x faster
Synthetic training images	12.26M
V2 english parameters	54M
V2 multilingual parameters	84M
Languages supported	6 (EN, ZH-S, ZH-T, JA, KO, RU)

What is it?

Nemotron OCR v2 is NVIDIA's open-weights document OCR system, released April 17 on HuggingFace. It detects text regions, transcribes them, and reconstructs reading order — all in one pass. Unlike legacy OCR pipelines that require a separate model per language, a single Nemotron OCR v2 checkpoint handles English, Chinese (Simplified and Traditional), Japanese, Korean, and Russian simultaneously. Two variants are available: v2_english (54M parameters, word-level output) and v2_multilingual (84M parameters, line-level for mixed-language documents).

How does it work?

The architecture uses a shared RegNetX-8GF backbone (FOTS-based) whose computed features are reused by three components without reprocessing the image: a text detector (bounding boxes), a Transformer-based recognizer (transcription), and a relational model (reading order and layout structure). This shared-backbone design is the primary source of the 28x speed advantage over PaddleOCR v5. Training used 12.26M synthetic images generated from the mOSCAR multilingual corpus via a modified SynthDoG renderer. Available via pip install and a free interactive HuggingFace Space.

Why does it matter?

Production OCR pipelines typically require separate models per language, adding operational complexity and latency. A single unified checkpoint removes that overhead. At 34.7 pages/sec on an A100, real-time multilingual document processing is practical. The model ships under a permissive license with the full 12.26M-image synthetic training dataset (CC-BY-4.0) also released, which is useful for fine-tuning on domain-specific documents.

Who is it for?

Document processing engineers and ML practitioners handling multilingual text at scale.

Try it

pip install nemotron-ocr  # then: from nemotron_ocr.inference.pipeline_v2 import NemotronOCRV2