NIST CAISI · 2026-05-01 · major
NIST CAISI Evaluation: DeepSeek V4 Pro Lags U.S. Frontier by ~8 Months Across Five Domains
First independent US-government technical evaluation of DeepSeek V4 Pro. CAISI finds it the most capable Chinese model tested but trailing the US frontier by about eight months, performing similarly to GPT-5.

NIST's CAISI publishes its first independent technical evaluation of DeepSeek V4 Pro across five capability domains.
Key specs
| Capability gap vs frontier | ~8 months |
|---|---|
| Benchmarks evaluated | 9 |
| Capability domains | 5 |
| Held out benchmarks | ARC-AGI-2 semi-private + PortBench |
| Cost vs gpt 5.4 mini | 53% cheaper to 41% more expensive across 7 benchmarks |
What is it?
CAISI is the Center for AI Standards and Innovation, the AI-evaluation arm of NIST stood up under the 2025 America's AI Action Plan. This is its second public DeepSeek report, following the September 2025 evaluation of earlier DeepSeek models. It tests V4 Pro on nine benchmarks spanning cybersecurity, software engineering, natural sciences, abstract reasoning, and mathematics.
How does it work?
Two of the nine benchmarks are held out from the public to detect benchmark gaming: ARC-AGI-2's semi-private split and CAISI's internally-built PortBench for software engineering. Models are scored on both capability (against US frontier baselines) and cost per task. The aggregate finding: V4 Pro performs similarly to GPT-5, which shipped about eight months earlier, despite DeepSeek's own reporting suggesting near-parity with current US frontier models.
Why does it matter?
DeepSeek V4 Pro is the most-discussed Chinese model release of 2026, with vendor benchmarks claiming it closes the gap with GPT-5.5 and Gemini 3.1 Pro. CAISI's independent numbers give policymakers and enterprise buyers a reference point that does not rely on vendor self-reporting, and document that V4 Pro is more cost-efficient than GPT-5.4 mini on five of seven benchmarks (53% cheaper to 41% more expensive depending on the task).
Who is it for?
AI policy analysts, enterprise procurement teams comparing US vs Chinese frontier models, infosec leads weighing open-weight Chinese models
Try it
https://www.nist.gov/news-events/news/2026/05/caisi-evaluation-deepseek-v4-pro