Alibaba / Qwen · 2026-04-17 · major
Qwen3.6-35B-A3B — 35B MoE Coding Model, 3B Active Params, SWE-bench 73.4%
Alibaba open-sources a 35B MoE model activating only 3B parameters per token. Scores 73.4% SWE-bench Verified and 86.0% GPQA. Fits in ~20 GB locally. Apache 2.0.

Alibaba's 35B MoE model uses only 3B active params, scores 73.4% SWE-bench Verified, and runs locally on a MacBook Pro.
Key specs
| License | Apache 2.0 |
|---|---|
| Active params | 3B |
| Context window | 262K tokens |
| SWE-bench | 73.4% |
| GPQA | 86.0% |
| Total parameters | 35B |
| Aime 2026 | 92.7% |
| Live code bench v6 | 80.4% |
What is it?
Qwen3.6-35B-A3B is a sparse Mixture-of-Experts model released under Apache 2.0. With 256 experts and only 3B parameters activated per forward pass, it delivers coding and reasoning performance comparable to dense models far larger than its active size. It supports text, image, and video inputs with a native 262K-token context window extensible to ~1M tokens via YaRN.
How does it work?
The model uses Gated DeltaNet (a hybrid attention variant) combined with a 256-expert MoE feed-forward layer. A preserve_thinking flag retains reasoning traces across multi-turn agent conversations, reducing redundant re-reasoning steps. Multi-Token Prediction enables speculative decoding for higher throughput. Weights in BF16 run locally via vLLM, SGLang, or LM Studio.
Why does it matter?
At 3B active parameters, inference cost is a fraction of comparable dense models. Scoring 73.4% on SWE-bench Verified puts it in frontier coding territory while running locally. The preserve_thinking feature and native tool-calling reduce prompt engineering overhead for agentic coding workflows.
Who is it for?
Developers building coding agents, teams wanting frontier-class reasoning locally, agentic pipelines needing long context at low inference cost.
Try it
huggingface-cli download Qwen/Qwen3.6-35B-A3B # or: qwen3.6-flash on Alibaba Cloud Model Studio