AI/TLDR

Qwen · 2026-04-30 · notable

Qwen-Scope — Open Sparse-Autoencoder Suite for Steering Qwen3 and Qwen3.5 Without Prompt Engineering

Alibaba's Qwen team open-sources 14 SAE checkpoints across 7 Qwen3/3.5 variants, with applications spanning inference steering, benchmark analysis, toxicity classification, and post-training cleanup.

Qwen-Scope sparse autoencoder collection on Hugging Face

An open SAE toolkit that turns Qwen3/3.5 internal features into knobs you can twist to steer outputs and audit benchmarks.

Key specs

Sae checkpoints14
Model variants7
Top k50 or 100
Sae width dense16x hidden size
Sae width mo e32K to 128K (16x to 64x expansion)
Code switching reductionover 50%
Safety feature coverage99.74%
Benchmark spearman0.85

What is it?

Qwen-Scope is a research-grade suite of sparse autoencoders (SAEs) trained on the residual streams of seven Qwen3 and Qwen3.5 variants — from 1.7B up to the 35B-A3B MoE — released openly on Hugging Face with a companion paper and demo Space.

How does it work?

Each SAE uses a Top-k activation rule (k = 50 or 100) over an overcomplete latent basis. For dense backbones the SAE width is 16× the hidden size; for MoE backbones widths scale from 32K up to 128K (16×–64× expansion). The team ships four reference workflows: inference-time steering by suppressing or amplifying latent features, SAE-based benchmark analysis (~0.85 Spearman with performance-based redundancy across 17 benchmarks), data-centric workflows for toxicity classifiers and safety-data synthesis (99.74% feature coverage), and post-training fixes that cut code-switching ratios by over 50%.

Why does it matter?

SAEs have been a mostly-academic curiosity tied to small models from Anthropic and a few labs. Shipping production-scale SAEs for a popular open model family — with concrete recipes for steering, eval analysis, and RL/SFT cleanup — moves interpretability from interpretability-paper territory into the hands of anyone shipping a Qwen3 fine-tune.

Who is it for?

ML researchers, alignment teams, Qwen fine-tuners

Try it

https://huggingface.co/collections/Qwen/qwen-scope

Sources · 3 outlets

Tags

  • qwen
  • qwen-scope
  • sparse-autoencoders
  • interpretability
  • steering
  • qwen3
  • qwen3-5
  • alibaba
  • open-source

← All releases · Learn AI