AI/TLDR

Alibaba Qwen · 2026-06-16 · major

Qwen-Robot Suite — Alibaba's three foundation models for robots

Alibaba's Qwen team ships Qwen-Robot Suite, three open foundation models for embodied AI: Qwen-RobotManip for manipulation, Qwen-RobotNav for navigation, and Qwen-RobotWorld as a video world model.

Qwen-Robot Suite banner with the three Alibaba embodied-AI models RobotManip, RobotNav and RobotWorld
TechNode

Three open foundation models from Alibaba's Qwen team that move robots, drive them around, and predict what happens next.

Key specs

Robo challenge table30 v1 (qwen robot manip)Rank #1
Vln ce rx r (qwen robot nav)76.5% SR
Qwen robot nav sizes2B / 4B / 8B

Quick facts

MakerAlibaba (Qwen team)
ModelsRobotManip, RobotNav, RobotWorld
DomainEmbodied AI / robotics
RobotManip baseQwen3.5-4B
RobotNav sizes2B / 4B / 8B (Qwen3-VL base)
AvailabilityRobotManip + RobotNav as public repos; RobotWorld as paper

What is it?

Qwen-Robot Suite is Alibaba's first family of AI models built for robots instead of chatbots. The suite has three parts: Qwen-RobotManip for vision-language manipulation, Qwen-RobotNav for navigation, and Qwen-RobotWorld as a video world model. Alibaba calls this a complete software stack for embodied AI.

How does it work?

Qwen-RobotManip is built on Qwen3.5-4B and turns heterogeneous robot data into a shared 80-dimensional action space, which lets one model train across different robot arms. Qwen-RobotNav is built on Qwen3-VL and ships in 2B, 4B, and 8B sizes with a controllable token budget so callers can trade compute for accuracy. Qwen-RobotWorld is a 60-layer MMDiT video model with a frozen Qwen2.5-VL encoder that uses natural language as the action interface and predicts future video given a goal.

Why does it matter?

Most large language model labs treat robotics as someone else's problem. Alibaba is now pushing Qwen into the physical world with named, reproducible components instead of a single locked-in stack. For robotics teams this means a public starting point for VLA, navigation, and world-model work that already posts top scores on RoboChallenge Table30-v1, VLN-CE RxR, EWMBench, and DreamGen Bench.

Who is it for?

robotics researchers, embodied AI teams

Frequently asked questions

What is Qwen-Robot Suite?
Qwen-Robot Suite is Alibaba's first set of AI foundation models built for robots. The suite contains three models: Qwen-RobotManip for controlling robot arms, Qwen-RobotNav for moving around physical spaces, and Qwen-RobotWorld for predicting what the world will look like next given a robot's actions.
Which model in Qwen-Robot Suite handles manipulation?
Qwen-RobotManip is the manipulation model. Qwen-RobotManip is built on Qwen3.5-4B and maps heterogeneous robot data into a shared 80-dimensional action space, which lets one model train across different robot arms. Alibaba says Qwen-RobotManip ranks first on the RoboChallenge Table30-v1 leaderboard.
How big is Qwen-RobotNav?
Qwen-RobotNav is released in three sizes: 2B, 4B, and 8B parameters, all built on the Qwen3-VL vision-language base. Qwen-RobotNav posts a 76.5% success rate on the VLN-CE RxR navigation benchmark and exposes a controllable token-budget interface so callers can trade compute for accuracy at inference time.
Is Qwen-Robot Suite open source?
Qwen-RobotManip and Qwen-RobotNav ship with public code, according to Alibaba's announcement coverage. Qwen-RobotWorld is presented as a research paper only, without released code at launch. The license and exact weight-distribution terms are not stated in the Qwen blog posts; check the Qwen GitHub before relying on it for commercial use.

Try it

https://qwen.ai/blog?id=qwen-robotmanip

Sources · 3 outlets

Tags

  • model
  • robotics
  • embodied-ai
  • vla
  • world-model
  • navigation
  • vision-language-action
  • qwen
  • alibaba
  • open-weights

← All releases · Learn AI