AI/TLDR

Qwen3.5

Alibaba's open-weight 397B-A17B multimodal agent model.

Overview

Qwen3.5 is the flagship open-weight model family released by Alibaba's Qwen team on 16 February 2026. The headline model, Qwen3.5-397B-A17B, is a sparse mixture-of-experts network with 397 billion total parameters that activates only about 17 billion per token, giving it the capability of a very large model at the inference cost of a much smaller one. It ships under the Apache 2.0 license with weights on Hugging Face, GitHub, and ModelScope.

Architecturally, Qwen3.5 moves away from a pure softmax-attention transformer. Roughly three quarters of its layers use Gated DeltaNet linear attention, with the remaining quarter using conventional grouped-query softmax attention and RoPE. This hybrid design is what lets Alibaba claim large throughput gains over the dense Qwen3-Max at long context lengths while matching its quality. It is also the first open-weight Qwen model with native vision and video input, folding the previous text-only Qwen3 and vision-focused Qwen3-VL lines into one early-fusion backbone.

The flagship was followed by smaller open-weight variants — Qwen3.5-122B-A10B, 35B-A3B, and 27B on 24 February 2026, then 9B, 4B, 2B, and 0.8B on 2 March 2026 — covering everything from data-center MoE to edge-sized dense models. A hosted, proprietary Qwen3.5-Plus endpoint extends the context window to 1M tokens. The series supports 201 languages and dialects (up from 119 in Qwen3) and runs in both reasoning and non-reasoning modes.

Released2026-02-16
LicenseApache-2.0
WeightsOpen weights
Parameters397B total / 17B active (MoE, 512 experts)
Context256K
ArchitectureHybrid sparse mixture-of-experts. Most layers use Gated DeltaNet linear attention (~75%), interleaved with full softmax attention using grouped-query attention and RoPE (~25%), on top of a 397B-parameter MoE that activates ~17B parameters per token from 512 experts. A single early-fusion backbone handles text, image, and video tokens, unifying the former Qwen3 (text) and Qwen3-VL (vision) lines.
ModalitiesText, Vision, Video, PDF
Statusavailable

Benchmarks

  1. GPQA Diamond88.4%
  2. AIME 202691.3%
  3. LiveCodeBench v683.6%
  4. SWE-bench Verified76.4%
  5. MMLU-Pro87.8%
  6. IFBench76.5%
  7. Tau2-Bench86.7%
  8. MMMU85%
  9. MMMU-Pro79%
  10. OmniDocBench v1.590.8%
  11. Video-MME87.5%
  12. BrowseComp78.6%
  13. Terminal-Bench 2.052.5%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.54 / 1M tokens per 1M tokens
Output$3.40 / 1M tokens per 1M tokens

Third-party hosted pricing for the open-weight Qwen3.5-397B-A17B (FP8) on DeepInfra; the weights themselves are free to self-host under Apache 2.0. Alibaba's hosted Qwen3.5-Plus endpoint is priced separately.

Pricing source ↗

Strengths

  • Strong agentic and tool-use results: 86.7 on Tau2-Bench, 78.6 on BrowseComp, and 52.5 on Terminal-Bench 2.0
  • Top-tier open-weight reasoning and coding — 88.4 GPQA Diamond, 91.3 AIME 2026, 83.6 LiveCodeBench v6, 76.4 SWE-bench Verified
  • Native multimodality in an open model: text, images, video, and document understanding from one backbone (90.8 on OmniDocBench v1.5)
  • Hybrid linear + softmax attention gives much higher long-context throughput than dense models at comparable quality
  • Permissive Apache 2.0 license with a full ladder of sizes from 397B-A17B down to 0.8B for self-hosting
  • Broad multilingual coverage across 201 languages and dialects

Best for

  • Self-hosted agentic systems that need open weights for tool use, browsing, and terminal/coding workflows
  • Multimodal document, chart, and video understanding pipelines on private infrastructure
  • Long-context retrieval and analysis up to 256K tokens (1M via the hosted Plus endpoint)
  • On-device and edge deployments using the smaller 27B/9B/4B/2B/0.8B dense variants
  • Cost-sensitive high-throughput inference where the MoE's 17B active footprint cuts serving cost
  • Multilingual assistants and translation across 201 languages

How to access

ProviderModel ID
DeepInfra ↗Qwen/Qwen3.5-397B-A17B
Alibaba Cloud Model Studio (DashScope) ↗qwen3.5-plus

Qwen (open-weight) — every version

The full lineage of the Qwen (open-weight) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Qwen3.6current2026-04Apache-2.0
Qwen3.52026-02-16Apache-2.0
Qwen3 (2507 update)2025-07Apache-2.0
Qwen32025-04-28Apache-2.0
Qwen2.52024-09Apache-2.0
Qwen22024-06Apache-2.0

FAQ

Is Qwen3.5 open source?

The weights are open under the Apache 2.0 license and downloadable from Hugging Face, GitHub, and ModelScope, so you can self-host and use it commercially. The hosted Qwen3.5-Plus endpoint is a separate proprietary service.

How big is Qwen3.5 and how is it different from earlier Qwen models?

The flagship Qwen3.5-397B-A17B has 397 billion total parameters but activates only about 17 billion per token via a 512-expert MoE. Unlike earlier Qwen models it uses a hybrid architecture — mostly Gated DeltaNet linear attention with some full softmax attention — and is the first open-weight Qwen with native vision and video, merging the old Qwen3 and Qwen3-VL lines.

What context length does Qwen3.5 support?

The open-weight Qwen3.5-397B-A17B handles a native context window of roughly 256K tokens (around 262K). The hosted Qwen3.5-Plus endpoint extends this to 1 million tokens.

What modalities does Qwen3.5 handle?

It is a native multimodal model that accepts text, images, and video, and performs strong document understanding (90.8 on OmniDocBench v1.5). Audio is handled by the separate Qwen-Omni line, not Qwen3.5.