AI/TLDR

Allen Institute for AI · 2026-05-05 · major

MolmoAct 2 — Ai2's Fully Open Bimanual Robotics Model With 720h Open Dataset

Ai2 released MolmoAct 2, a fully open Vision-Language-Action model for real-world robots. It outperforms π₀.₅ on seven benchmarks, hits 87.1% on real-world Franka tasks, runs up to 37x faster than its predecessor, and ships with the largest open bimanual robot dataset to date.

Ai2 MolmoAct 2 visualization showing a bimanual robot manipulating objects with action and depth tokens overlaid.
Allen Institute for AI

An open VLA model from Ai2 that runs two-armed robots in the real world — weights, code and 720h of bimanual data on day one.

Key specs

LicenseApache 2.0
Libero avg97.2%
Real world yam50.1%
Franka real world87.1%
Speedup vs v1up to 37x
Molmoact2 bimanual yam hours720+
Embodied reasoning avg63.8%

What is it?

MolmoAct 2 is an action reasoning model from the Allen Institute for AI that takes camera frames and a natural-language instruction and outputs continuous robot actions. It targets practical deployment on platforms like the bimanual YAM, low-cost SO-100/101 arms, and the Franka. Ai2 also released the MolmoAct 2-Bimanual YAM dataset — 720+ hours and 146,000 annotations across 28 tasks — which they call the largest open-source bimanual manipulation dataset published.

How does it work?

The model is built on Molmo 2-ER, a new embodied-reasoning Molmo variant trained on 3.3M samples of pointing, detection and abstract spatial reasoning. A flow-matching continuous-action expert is grafted onto the discrete-token VLM via per-layer KV-cache conditioning, with a 'specialize-then-rehearse' recipe that preserves general VLM skills. A separate OpenFAST tokenizer is trained across five embodiments. The MolmoAct 2-Think variant only re-predicts depth tokens for parts of the scene that have changed, cutting latency.

Why does it matter?

Frontier robotics has been gated by closed proprietary systems (π₀.₅, Gemini Robotics ER) and tiny private datasets. Ai2 ships not only competitive numbers — Molmo 2-ER beats GPT-5 and Gemini Robotics ER-1.5 averaged across 13 embodied benchmarks — but the data and training code, which is what makes the result actually reproducible by smaller labs and university groups. Stanford's Cong Lab is already piloting it for CRISPR wetlab work.

Who is it for?

Robotics researchers, embodied-AI labs, anyone building VLAs on top of open weights.

Try it

https://huggingface.co/collections/allenai/molmoact2-models

Sources · 3 outlets

Tags

  • ai2
  • allen-ai
  • molmoact
  • vision-language-action
  • robotics
  • bimanual
  • open-source
  • embodied-ai
  • molmo2-er

← All releases · Learn AI