Qwen3.5

Alibaba's open-weight 397B-A17B multimodal agent model.

Overview

Qwen3.5 is the flagship open-weight model family released by Alibaba's Qwen team on 16 February 2026. The headline model, Qwen3.5-397B-A17B, is a sparse mixture-of-experts network with 397 billion total parameters that activates only about 17 billion per token, giving it the capability of a very large model at the inference cost of a much smaller one. It ships under the Apache 2.0 license with weights on Hugging Face, GitHub, and ModelScope.

Architecturally, Qwen3.5 moves away from a pure softmax-attention transformer. Roughly three quarters of its layers use Gated DeltaNet linear attention, with the remaining quarter using conventional grouped-query softmax attention and RoPE. This hybrid design is what lets Alibaba claim large throughput gains over the dense Qwen3-Max at long context lengths while matching its quality. It is also the first open-weight Qwen model with native vision and video input, folding the previous text-only Qwen3 and vision-focused Qwen3-VL lines into one early-fusion backbone.

The flagship was followed by smaller open-weight variants — Qwen3.5-122B-A10B, 35B-A3B, and 27B on 24 February 2026, then 9B, 4B, 2B, and 0.8B on 2 March 2026 — covering everything from data-center MoE to edge-sized dense models. A hosted, proprietary Qwen3.5-Plus endpoint extends the context window to 1M tokens. The series supports 201 languages and dialects (up from 119 in Qwen3) and runs in both reasoning and non-reasoning modes.

Released	2026-02-16
License	Apache-2.0
Weights	Open weights
Parameters	397B total / 17B active (MoE, 512 experts)
Context	256K
Architecture	Hybrid sparse mixture-of-experts. Most layers use Gated DeltaNet linear attention (~75%), interleaved with full softmax attention using grouped-query attention and RoPE (~25%), on top of a 397B-parameter MoE that activates ~17B parameters per token from 512 experts. A single early-fusion backbone handles text, image, and video tokens, unifying the former Qwen3 (text) and Qwen3-VL (vision) lines.
Modalities	Text, Vision, Video, PDF
Status	available

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.54 / 1M tokens per 1M tokens
Output	$3.40 / 1M tokens per 1M tokens

Third-party hosted pricing for the open-weight Qwen3.5-397B-A17B (FP8) on DeepInfra; the weights themselves are free to self-host under Apache 2.0. Alibaba's hosted Qwen3.5-Plus endpoint is priced separately.

Pricing source ↗

Strengths

Strong agentic and tool-use results: 86.7 on Tau2-Bench, 78.6 on BrowseComp, and 52.5 on Terminal-Bench 2.0
Top-tier open-weight reasoning and coding — 88.4 GPQA Diamond, 91.3 AIME 2026, 83.6 LiveCodeBench v6, 76.4 SWE-bench Verified
Native multimodality in an open model: text, images, video, and document understanding from one backbone (90.8 on OmniDocBench v1.5)
Hybrid linear + softmax attention gives much higher long-context throughput than dense models at comparable quality
Permissive Apache 2.0 license with a full ladder of sizes from 397B-A17B down to 0.8B for self-hosting
Broad multilingual coverage across 201 languages and dialects

Best for

Self-hosted agentic systems that need open weights for tool use, browsing, and terminal/coding workflows
Multimodal document, chart, and video understanding pipelines on private infrastructure
Long-context retrieval and analysis up to 256K tokens (1M via the hosted Plus endpoint)
On-device and edge deployments using the smaller 27B/9B/4B/2B/0.8B dense variants
Cost-sensitive high-throughput inference where the MoE's 17B active footprint cuts serving cost
Multilingual assistants and translation across 201 languages

How to access

Provider	Model ID
DeepInfra ↗	`Qwen/Qwen3.5-397B-A17B`
Alibaba Cloud Model Studio (DashScope) ↗	`qwen3.5-plus`

Qwen (open-weight) — every version

The full lineage of the Qwen (open-weight) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Qwen3.6current	2026-04	—	Apache-2.0
Qwen3.5	2026-02-16	—	Apache-2.0
Qwen3 (2507 update)	2025-07	—	Apache-2.0
Qwen3	2025-04-28	—	Apache-2.0
Qwen2.5	2024-09	—	Apache-2.0
Qwen2	2024-06	—	Apache-2.0

FAQ

Is Qwen3.5 open source?

The weights are open under the Apache 2.0 license and downloadable from Hugging Face, GitHub, and ModelScope, so you can self-host and use it commercially. The hosted Qwen3.5-Plus endpoint is a separate proprietary service.

How big is Qwen3.5 and how is it different from earlier Qwen models?

The flagship Qwen3.5-397B-A17B has 397 billion total parameters but activates only about 17 billion per token via a 512-expert MoE. Unlike earlier Qwen models it uses a hybrid architecture — mostly Gated DeltaNet linear attention with some full softmax attention — and is the first open-weight Qwen with native vision and video, merging the old Qwen3 and Qwen3-VL lines.

What context length does Qwen3.5 support?

The open-weight Qwen3.5-397B-A17B handles a native context window of roughly 256K tokens (around 262K). The hosted Qwen3.5-Plus endpoint extends this to 1 million tokens.

What modalities does Qwen3.5 handle?

It is a native multimodal model that accepts text, images, and video, and performs strong document understanding (90.8 on OmniDocBench v1.5). Audio is handled by the separate Qwen-Omni line, not Qwen3.5.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// Qwen (open-weight) — every version

// FAQ