New AI Datasets — Training & Eval Corpora
New training and evaluation datasets for AI — fresh corpora and benchmarks, with what's inside each one and how to use it.
3 releases tracked
- ClawBench — 153 Real-World Browser-Agent Tasks Across 144 Live Websites; Best Model at 33.3%
Agents score 70% in sandboxes; ClawBench shows the real-world number is 33%
- Nemotron-Personas-Korea — 7M Synthetic Personas from Official Korean Demographics
NVIDIA's Korean-language persona dataset: 7 million synthetic personas grounded in official government statistics, CC BY 4.0.
- OpenSpatial — 3M-sample spatial-intelligence data engine
A 3 million-sample, open-source data engine for 3D spatial reasoning — fine-tuned models gain around 19 percent relatively on spatial benchmarks.