Alibaba Qwen · 2026-06-16 · major
Qwen-Robot Suite — Alibaba's three foundation models for robots
Alibaba's Qwen team ships Qwen-Robot Suite, three open foundation models for embodied AI: Qwen-RobotManip for manipulation, Qwen-RobotNav for navigation, and Qwen-RobotWorld as a video world model.

Three open foundation models from Alibaba's Qwen team that move robots, drive them around, and predict what happens next.
Key specs
| Robo challenge table30 v1 (qwen robot manip) | Rank #1 |
|---|---|
| Vln ce rx r (qwen robot nav) | 76.5% SR |
| Qwen robot nav sizes | 2B / 4B / 8B |
Quick facts
| Maker | Alibaba (Qwen team) |
|---|---|
| Models | RobotManip, RobotNav, RobotWorld |
| Domain | Embodied AI / robotics |
| RobotManip base | Qwen3.5-4B |
| RobotNav sizes | 2B / 4B / 8B (Qwen3-VL base) |
| Availability | RobotManip + RobotNav as public repos; RobotWorld as paper |
What is it?
Qwen-Robot Suite is Alibaba's first family of AI models built for robots instead of chatbots. The suite has three parts: Qwen-RobotManip for vision-language manipulation, Qwen-RobotNav for navigation, and Qwen-RobotWorld as a video world model. Alibaba calls this a complete software stack for embodied AI.
How does it work?
Qwen-RobotManip is built on Qwen3.5-4B and turns heterogeneous robot data into a shared 80-dimensional action space, which lets one model train across different robot arms. Qwen-RobotNav is built on Qwen3-VL and ships in 2B, 4B, and 8B sizes with a controllable token budget so callers can trade compute for accuracy. Qwen-RobotWorld is a 60-layer MMDiT video model with a frozen Qwen2.5-VL encoder that uses natural language as the action interface and predicts future video given a goal.
Why does it matter?
Most large language model labs treat robotics as someone else's problem. Alibaba is now pushing Qwen into the physical world with named, reproducible components instead of a single locked-in stack. For robotics teams this means a public starting point for VLA, navigation, and world-model work that already posts top scores on RoboChallenge Table30-v1, VLN-CE RxR, EWMBench, and DreamGen Bench.
Who is it for?
robotics researchers, embodied AI teams
Frequently asked questions
- What is Qwen-Robot Suite?
- Qwen-Robot Suite is Alibaba's first set of AI foundation models built for robots. The suite contains three models: Qwen-RobotManip for controlling robot arms, Qwen-RobotNav for moving around physical spaces, and Qwen-RobotWorld for predicting what the world will look like next given a robot's actions.
- Which model in Qwen-Robot Suite handles manipulation?
- Qwen-RobotManip is the manipulation model. Qwen-RobotManip is built on Qwen3.5-4B and maps heterogeneous robot data into a shared 80-dimensional action space, which lets one model train across different robot arms. Alibaba says Qwen-RobotManip ranks first on the RoboChallenge Table30-v1 leaderboard.
- How big is Qwen-RobotNav?
- Qwen-RobotNav is released in three sizes: 2B, 4B, and 8B parameters, all built on the Qwen3-VL vision-language base. Qwen-RobotNav posts a 76.5% success rate on the VLN-CE RxR navigation benchmark and exposes a controllable token-budget interface so callers can trade compute for accuracy at inference time.
- Is Qwen-Robot Suite open source?
- Qwen-RobotManip and Qwen-RobotNav ship with public code, according to Alibaba's announcement coverage. Qwen-RobotWorld is presented as a research paper only, without released code at launch. The license and exact weight-distribution terms are not stated in the Qwen blog posts; check the Qwen GitHub before relying on it for commercial use.
Try it
https://qwen.ai/blog?id=qwen-robotmanip