AI/TLDR

Qwen · 2026-06-23 · major

Qwen-AgentWorld — language world models that simulate seven agent domains

Qwen-AgentWorld is a pair of open-weight world models (35B-A3B and 397B-A17B) that simulate seven agent environments — MCP, search, terminal, software engineering, Android, web, and OS — through chain-of-thought reasoning.

Qwen-AgentWorld paper thumbnail on Hugging Face

Open-weight world models from Qwen that simulate seven agent environments in a single 262K-context backbone.

Key specs

Context window262K
Big model397B-A17B
Small model35B-A3B
Domains7

Quick facts

MakerQwen team, Alibaba Cloud
ModelsQwen-AgentWorld-35B-A3B and 397B-A17B
ArchitectureMixture-of-experts, 8 routed + 1 shared per token
Context window262,144 tokens
DomainsMCP, search, terminal, SWE, Android, web, OS
Training data10M+ environment trajectories
LicenseApache-2.0

What is it?

Qwen-AgentWorld is a pair of mixture-of-experts world models — 35B-A3B and 397B-A17B — that the Qwen team has released alongside an arXiv paper, GitHub code, and the AgentWorldBench evaluation suite. Both models share a 262K context window and target seven distinct agent environments.

How does it work?

The pipeline runs in three stages: continued pretraining absorbs state-transition data, supervised fine-tuning activates next-state reasoning, and reinforcement learning with hybrid rubric-and-rule rewards tunes simulation fidelity. Training inputs come from 10M+ trajectories collected by running five frontier models across nine established benchmarks.

Why does it matter?

Training agents normally means standing up real terminals, browsers, and Android emulators — slow, flaky, and expensive. A world model lets the agent loop run entirely in language, which is what makes it possible to do reinforcement learning on tool-using agents at the same scale that pretraining works for language models. Qwen-AgentWorld is the first open-weight version that spans more than one domain.

Who is it for?

Agent researchers and reinforcement-learning engineers

Frequently asked questions

What is a language world model and what makes Qwen-AgentWorld different?
A language world model predicts the next state of an environment from text alone — file diffs after a shell command, the new DOM after a click, the next Android screen after a tap. Qwen-AgentWorld is the first open-weight model to do this across seven distinct agent domains in a single set of weights, rather than one model per environment.
Which environments does Qwen-AgentWorld cover?
Qwen-AgentWorld is trained on seven agent domains: MCP tool calls, web search, the terminal, software engineering tasks, Android UIs, web browsing, and full operating systems. The same model simulates the next state across all seven, which is what lets a downstream agent be trained and evaluated against it without spinning up real environments.
How was Qwen-AgentWorld trained?
Qwen-AgentWorld is trained in three stages: continued pretraining injects state-transition dynamics and world knowledge, supervised fine-tuning activates next-state-prediction reasoning patterns, and reinforcement learning with hybrid rubric-and-rule rewards sharpens simulation fidelity. The training set is more than 10 million real interaction trajectories collected from five frontier models on nine benchmarks.
Where can I download the models and code?
Qwen-AgentWorld weights are on Hugging Face under Qwen/Qwen-AgentWorld-35B-A3B and Qwen/Qwen-AgentWorld-397B-A17B under the Apache-2.0 license. The training and evaluation code lives in the QwenLM/Qwen-AgentWorld GitHub repo, and AgentWorldBench, the evaluation suite, is published as a Hugging Face dataset.

Try it

huggingface.co/Qwen/Qwen-AgentWorld-35B-A3B

Sources · 3 outlets

Tags

  • world-model
  • agents
  • qwen
  • alibaba
  • mixture-of-experts
  • reinforcement-learning
  • open-weights
  • apache-2.0
  • 262k-context
  • hf-trending

← All releases · Learn AI