Overview
Qwen3.5 is the flagship open-weight model family released by Alibaba's Qwen team on 16 February 2026. The headline model, Qwen3.5-397B-A17B, is a sparse mixture-of-experts network with 397 billion total parameters that activates only about 17 billion per token, giving it the capability of a very large model at the inference cost of a much smaller one. It ships under the Apache 2.0 license with weights on Hugging Face, GitHub, and ModelScope.
Architecturally, Qwen3.5 moves away from a pure softmax-attention transformer. Roughly three quarters of its layers use Gated DeltaNet linear attention, with the remaining quarter using conventional grouped-query softmax attention and RoPE. This hybrid design is what lets Alibaba claim large throughput gains over the dense Qwen3-Max at long context lengths while matching its quality. It is also the first open-weight Qwen model with native vision and video input, folding the previous text-only Qwen3 and vision-focused Qwen3-VL lines into one early-fusion backbone.
The flagship was followed by smaller open-weight variants — Qwen3.5-122B-A10B, 35B-A3B, and 27B on 24 February 2026, then 9B, 4B, 2B, and 0.8B on 2 March 2026 — covering everything from data-center MoE to edge-sized dense models. A hosted, proprietary Qwen3.5-Plus endpoint extends the context window to 1M tokens. The series supports 201 languages and dialects (up from 119 in Qwen3) and runs in both reasoning and non-reasoning modes.
| Released | 2026-02-16 |
|---|---|
| License | Apache-2.0 |
| Weights | Open weights |
| Parameters | 397B total / 17B active (MoE, 512 experts) |
| Context | 256K |
| Architecture | Hybrid sparse mixture-of-experts. Most layers use Gated DeltaNet linear attention (~75%), interleaved with full softmax attention using grouped-query attention and RoPE (~25%), on top of a 397B-parameter MoE that activates ~17B parameters per token from 512 experts. A single early-fusion backbone handles text, image, and video tokens, unifying the former Qwen3 (text) and Qwen3-VL (vision) lines. |
| Modalities | Text, Vision, Video, PDF |
| Status | available |
Benchmarks
- GPQA Diamond88.4%
- AIME 202691.3%
- LiveCodeBench v683.6%
- SWE-bench Verified76.4%
- MMLU-Pro87.8%
- IFBench76.5%
- Tau2-Bench86.7%
- MMMU85%
- MMMU-Pro79%
- OmniDocBench v1.590.8%
- Video-MME87.5%
- BrowseComp78.6%
- Terminal-Bench 2.052.5%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.54 / 1M tokens per 1M tokens |
|---|---|
| Output | $3.40 / 1M tokens per 1M tokens |
Third-party hosted pricing for the open-weight Qwen3.5-397B-A17B (FP8) on DeepInfra; the weights themselves are free to self-host under Apache 2.0. Alibaba's hosted Qwen3.5-Plus endpoint is priced separately.
Strengths
- Strong agentic and tool-use results: 86.7 on Tau2-Bench, 78.6 on BrowseComp, and 52.5 on Terminal-Bench 2.0
- Top-tier open-weight reasoning and coding — 88.4 GPQA Diamond, 91.3 AIME 2026, 83.6 LiveCodeBench v6, 76.4 SWE-bench Verified
- Native multimodality in an open model: text, images, video, and document understanding from one backbone (90.8 on OmniDocBench v1.5)
- Hybrid linear + softmax attention gives much higher long-context throughput than dense models at comparable quality
- Permissive Apache 2.0 license with a full ladder of sizes from 397B-A17B down to 0.8B for self-hosting
- Broad multilingual coverage across 201 languages and dialects
Best for
- Self-hosted agentic systems that need open weights for tool use, browsing, and terminal/coding workflows
- Multimodal document, chart, and video understanding pipelines on private infrastructure
- Long-context retrieval and analysis up to 256K tokens (1M via the hosted Plus endpoint)
- On-device and edge deployments using the smaller 27B/9B/4B/2B/0.8B dense variants
- Cost-sensitive high-throughput inference where the MoE's 17B active footprint cuts serving cost
- Multilingual assistants and translation across 201 languages
How to access
| Provider | Model ID |
|---|---|
| DeepInfra ↗ | Qwen/Qwen3.5-397B-A17B |
| Alibaba Cloud Model Studio (DashScope) ↗ | qwen3.5-plus |
Qwen (open-weight) — every version
The full lineage of the Qwen (open-weight) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| Qwen3.6current | 2026-04 | — | Apache-2.0 |
| Qwen3.5 | 2026-02-16 | — | Apache-2.0 |
| Qwen3 (2507 update) | 2025-07 | — | Apache-2.0 |
| Qwen3 | 2025-04-28 | — | Apache-2.0 |
| Qwen2.5 | 2024-09 | — | Apache-2.0 |
| Qwen2 | 2024-06 | — | Apache-2.0 |
FAQ
Is Qwen3.5 open source?
The weights are open under the Apache 2.0 license and downloadable from Hugging Face, GitHub, and ModelScope, so you can self-host and use it commercially. The hosted Qwen3.5-Plus endpoint is a separate proprietary service.
How big is Qwen3.5 and how is it different from earlier Qwen models?
The flagship Qwen3.5-397B-A17B has 397 billion total parameters but activates only about 17 billion per token via a 512-expert MoE. Unlike earlier Qwen models it uses a hybrid architecture — mostly Gated DeltaNet linear attention with some full softmax attention — and is the first open-weight Qwen with native vision and video, merging the old Qwen3 and Qwen3-VL lines.
What context length does Qwen3.5 support?
The open-weight Qwen3.5-397B-A17B handles a native context window of roughly 256K tokens (around 262K). The hosted Qwen3.5-Plus endpoint extends this to 1 million tokens.
What modalities does Qwen3.5 handle?
It is a native multimodal model that accepts text, images, and video, and performs strong document understanding (90.8 on OmniDocBench v1.5). Audio is handled by the separate Qwen-Omni line, not Qwen3.5.