AI/TLDR

HKU, Microsoft Research Asia · 2026-04-08 · notable

OpenSpatial — 3M-sample spatial-intelligence data engine

Open-source data engine plus a 3M-sample dataset for 3D spatial reasoning — models trained on it gain around 19% average relative improvement on spatial benchmarks.

OpenSpatial GitHub repository social card

A 3 million-sample, open-source data engine for 3D spatial reasoning — fine-tuned models gain around 19 percent relatively on spatial benchmarks.

Key specs

LicenseApache-2.0
Samples3M
Relative lift+19%

What is it?

OpenSpatial is an open-source data engine for building spatial-reasoning datasets, from a coalition led by researchers at HKU and Microsoft Research Asia. It converts 2D web imagery into 3D-annotated training samples and uses them to build a 3M-sample dataset (OpenSpatial-3M) covering five core spatial tasks: measurement, relationships, camera perception, multi-view consistency and scene-aware reasoning.

How does it work?

The engine uses 3D bounding boxes as its intermediate representation. For each source image, it lifts detected objects into 3D, generates the five tasks above, and produces training examples a VLM can consume. The engine, the dataset and the training recipe are all released under Apache-2.0. Models finetuned on OpenSpatial-3M show a roughly 19% average relative improvement across standard spatial reasoning benchmarks.

Why does it matter?

Spatial reasoning is one of the weakest parts of current VLMs, and it is exactly what embodied agents and world models need. A principled, open data engine plus a 3M-sample dataset is the kind of infrastructure that gets copied into a lot of downstream training runs.

Who is it for?

VLM and embodied-AI researchers, world-model teams.

Try it

github.com/VINHYU/OpenSpatial

Sources · 2 outlets

Tags

  • dataset
  • spatial
  • 3d
  • embodied-ai

← All releases · Learn AI