Peking University, Kling/Kuaishou, HKUST · 2026-04-06 · notable
OpenWorldLib — unified world-model codebase
Apache-2.0 framework that unifies world-model research under one inference stack — interactive video, 3D generation, VLA and multimodal reasoning all run through the same six modules.
One codebase, one definition, six modules — an attempt to pull scattered world-model research into a single inference framework.
Key specs
| License | Apache-2.0 |
|---|---|
| GitHub stars | ~590 |
What is it?
OpenWorldLib is a unified codebase and framework for 'world models,' from a coalition of authors at Peking University, the Kling team at Kuaishou, HKUST, Tsinghua, NUS and others. The paper first nails down what counts as a world model — a system centered on perception with action-conditioned simulation and long-term memory — and then ships a reference framework that implements that definition across several previously disjoint research areas.
How does it work?
The framework decomposes a world model into six modules: Operator (input validation), Synthesis (visual/audio/action generation), Reasoning (general, spatial and audio), Representation (3D reconstruction and simulation), Memory (context store and retrieval) and Pipeline (end-to-end orchestration). Interactive video generation, 3D scene generation, Vision-Language-Action models and multimodal reasoning all run through the same invocation API. Code templates for each module are provided; the repo is Apache-2.0 and already sits around 590 stars.
Why does it matter?
World-model research has been fragmented: video folks, 3D folks and VLA folks each write their own harness. A common inference stack and vocabulary is the kind of mundane-sounding infrastructure that makes the next year of world-model papers directly comparable — and that lets downstream teams plug one component in and swap others out without rewriting the pipeline.
Who is it for?
World-model researchers, embodied-AI teams, simulation stack builders.
Try it
github.com/OpenDCAI/OpenWorldLib