What is Qwen-Robot Suite?

Qwen-Robot Suite is Alibaba's first set of AI foundation models built for robots. The suite contains three models: Qwen-RobotManip for controlling robot arms, Qwen-RobotNav for moving around physical spaces, and Qwen-RobotWorld for predicting what the world will look like next given a robot's actions.

How big is Qwen-RobotNav?

Qwen-RobotNav is released in three sizes: 2B, 4B, and 8B parameters, all built on the Qwen3-VL vision-language base. Qwen-RobotNav posts a 76.5% success rate on the VLN-CE RxR navigation benchmark and exposes a controllable token-budget interface so callers can trade compute for accuracy at inference time.

Is Qwen-Robot Suite open source?

Qwen-RobotManip and Qwen-RobotNav ship with public code, according to Alibaba's announcement coverage. Qwen-RobotWorld is presented as a research paper only, without released code at launch. The license and exact weight-distribution terms are not stated in the Qwen blog posts; check the Qwen GitHub before relying on it for commercial use.

Alibaba Qwen · 2026-06-16 · major

Qwen-Robot Suite — Alibaba's three foundation models for robots

Alibaba's Qwen team ships Qwen-Robot Suite, three open foundation models for embodied AI: Qwen-RobotManip for manipulation, Qwen-RobotNav for navigation, and Qwen-RobotWorld as a video world model.

Qwen-Robot Suite banner with the three Alibaba embodied-AI models RobotManip, RobotNav and RobotWorld — TechNode

Three open foundation models from Alibaba's Qwen team that move robots, drive them around, and predict what happens next.

Key specs

Robo challenge table30 v1 (qwen robot manip)	Rank #1
Vln ce rx r (qwen robot nav)	76.5% SR
Qwen robot nav sizes	2B / 4B / 8B

Quick facts

Maker	Alibaba (Qwen team)
Models	RobotManip, RobotNav, RobotWorld
Domain	Embodied AI / robotics
RobotManip base	Qwen3.5-4B
RobotNav sizes	2B / 4B / 8B (Qwen3-VL base)
Availability	RobotManip + RobotNav as public repos; RobotWorld as paper

What is it?

Qwen-Robot Suite is Alibaba's first family of AI models built for robots instead of chatbots. The suite has three parts: Qwen-RobotManip for vision-language manipulation, Qwen-RobotNav for navigation, and Qwen-RobotWorld as a video world model. Alibaba calls this a complete software stack for embodied AI.

How does it work?

Qwen-RobotManip is built on Qwen3.5-4B and turns heterogeneous robot data into a shared 80-dimensional action space, which lets one model train across different robot arms. Qwen-RobotNav is built on Qwen3-VL and ships in 2B, 4B, and 8B sizes with a controllable token budget so callers can trade compute for accuracy. Qwen-RobotWorld is a 60-layer MMDiT video model with a frozen Qwen2.5-VL encoder that uses natural language as the action interface and predicts future video given a goal.

Why does it matter?

Most large language model labs treat robotics as someone else's problem. Alibaba is now pushing Qwen into the physical world with named, reproducible components instead of a single locked-in stack. For robotics teams this means a public starting point for VLA, navigation, and world-model work that already posts top scores on RoboChallenge Table30-v1, VLN-CE RxR, EWMBench, and DreamGen Bench.

Who is it for?

robotics researchers, embodied AI teams

Frequently asked questions

What is Qwen-Robot Suite?: Qwen-Robot Suite is Alibaba's first set of AI foundation models built for robots. The suite contains three models: Qwen-RobotManip for controlling robot arms, Qwen-RobotNav for moving around physical spaces, and Qwen-RobotWorld for predicting what the world will look like next given a robot's actions.
Which model in Qwen-Robot Suite handles manipulation?: Qwen-RobotManip is the manipulation model. Qwen-RobotManip is built on Qwen3.5-4B and maps heterogeneous robot data into a shared 80-dimensional action space, which lets one model train across different robot arms. Alibaba says Qwen-RobotManip ranks first on the RoboChallenge Table30-v1 leaderboard.
How big is Qwen-RobotNav?: Qwen-RobotNav is released in three sizes: 2B, 4B, and 8B parameters, all built on the Qwen3-VL vision-language base. Qwen-RobotNav posts a 76.5% success rate on the VLN-CE RxR navigation benchmark and exposes a controllable token-budget interface so callers can trade compute for accuracy at inference time.
Is Qwen-Robot Suite open source?: Qwen-RobotManip and Qwen-RobotNav ship with public code, according to Alibaba's announcement coverage. Qwen-RobotWorld is presented as a research paper only, without released code at launch. The license and exact weight-distribution terms are not stated in the Qwen blog posts; check the Qwen GitHub before relying on it for commercial use.

Try it

https://qwen.ai/blog?id=qwen-robotmanip