Overview
Qwen3-Max is the largest model in Alibaba's Qwen family, crossing the 1-trillion-parameter mark with a Mixture-of-Experts design that activates only a fraction of those parameters per token. It launched in September 2025 — first as a preview on September 5, then as a full production release later that month — and is positioned as the highest-performing tier of the Qwen-Max line for complex, multi-step tasks. Unlike most of Alibaba's Qwen releases, Qwen3-Max is closed-weight: it runs only through Alibaba Cloud's Model Studio API, Qwen Chat, and third-party gateways such as OpenRouter, with no local deployment.
The model was pretrained on roughly 36 trillion tokens (about twice the data used for Qwen2.5) with a multilingual, coding, and STEM/reasoning emphasis. It ships in two flavors: Qwen3-Max-Instruct, a fast non-thinking model aimed at coding and agentic tool use, and Qwen3-Max-Thinking, a reasoning variant that can call a code interpreter and use parallel test-time compute. Through the API it exposes a 262,144-token context window and up to 32,768 output tokens.
Qwen3-Max is a text-only model — vision tasks in the Qwen3 generation are handled by separate Qwen3-VL models. On release, the preview version of Qwen3-Max reached the global top three on the LMArena text leaderboard, and Alibaba reported strong day-one results on coding and agentic benchmarks. Treat the most aggressive math claims as vendor-reported rather than independently verified.
| Released | 2025-09 |
|---|---|
| License | Proprietary (closed weights), API access via Alibaba Cloud Model Studio |
| Weights | API only |
| Parameters | Over 1 trillion (MoE, sparse activation) |
| Context | 262K |
| Max output | 32K |
| Architecture | Sparse Mixture-of-Experts (MoE) transformer that activates a subset of its 1T+ parameters per token. Pretrained on roughly 36 trillion tokens (about 2x Qwen2.5). Shipped in two variants: Qwen3-Max-Instruct (no extended reasoning) and Qwen3-Max-Thinking (tool-augmented reasoning mode). |
| Knowledge cutoff | Not officially disclosed |
| Modalities | Text |
| Status | Available |
Benchmarks
- SWE-bench Verified (Qwen3-Max-Instruct)69.6%
- Tau2-bench (Qwen3-Max-Instruct, agentic tool-calling)74.8%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $1.20 / 1M tokens (0-32K input tier; rises to $2.40 at 32K-128K and $3.00 at 128K-252K) per 1M tokens |
|---|---|
| Output | $6.00 / 1M tokens (0-32K tier; up to $15.00 at higher tiers) per 1M tokens |
Tiered by input length on Alibaba Cloud Model Studio (International deployment). Batch invocation is charged at 50% of real-time pricing. Global-deployment pricing is lower.
Strengths
- Trillion-parameter MoE scale with efficient sparse activation per token
- Strong software-engineering performance (69.6 on SWE-bench Verified, Instruct variant)
- Top-tier agentic tool-calling (74.8 on Tau2-bench, Instruct variant)
- Large 262K-token context window for long documents and codebases
- Two variants — fast Instruct and reasoning-focused Thinking — under one model line
- Native function calling, structured JSON output, and web-search support via Model Studio
Best for
- Agentic workflows and multi-tool orchestration
- Software engineering and code generation across large repositories
- Long-context document analysis and summarization
- Complex multi-step reasoning and STEM problem solving
- Multilingual text generation and assistance
- Building API-driven assistants with structured output and function calling
How to access
| Provider | Model ID |
|---|---|
| Alibaba Cloud Model Studio ↗ | qwen3-max |
| OpenRouter ↗ | qwen/qwen3-max |
Qwen-Max — every version
The full lineage of the Qwen-Max line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| Qwen3.7-Maxcurrent | 2026-05 | — | Proprietary |
| Qwen3-Max | 2025-09 | — | Proprietary |
| Qwen2.5-Max | 2025-01-29 | — | Proprietary |
FAQ
Is Qwen3-Max open source?
No. Unlike many other Qwen models, Qwen3-Max is closed-weight. It is available only through APIs — Alibaba Cloud Model Studio, Qwen Chat, and gateways like OpenRouter — and does not support local deployment.
How many parameters does Qwen3-Max have?
Alibaba states Qwen3-Max crosses the 1-trillion-parameter mark using a Mixture-of-Experts (MoE) design, so only a sparse subset of those parameters is activated per token. A full technical report with the exact figure was not published at launch.
What is the context window of Qwen3-Max?
Through Alibaba Cloud Model Studio, Qwen3-Max supports a 262,144-token context window (about 262K) and up to 32,768 output tokens.
Does Qwen3-Max support images or video?
No. Qwen3-Max is a text-only model. Vision and video tasks in the Qwen3 generation are handled by separate Qwen3-VL models, not by Qwen3-Max.