Qwen2

Alibaba's June 2024 open-weight family: 0.5B–72B dense plus a 57B-A14B MoE, GQA throughout and up to 128K context.

Overview

Qwen2 is the open-weight large language model family Alibaba's Qwen team released on June 7, 2024, succeeding Qwen1.5. It ships in five sizes: dense Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B and Qwen2-72B, plus a Mixture-of-Experts model, Qwen2-57B-A14B, that activates roughly 14B of its 57B parameters per token via 64 routed and 8 shared experts. Every size uses Grouped Query Attention (GQA), and the family shares a 151,643-token multilingual vocabulary covering English, Chinese and 27 additional languages.

Each base model ships alongside an instruction-tuned -Instruct variant. Context length varies by size: 32K tokens for the 0.5B and 1.5B, 64K for the 57B-A14B-Instruct, and up to 128K (131,072) tokens for the 7B-Instruct and 72B-Instruct. The flagship Qwen2-72B base model posts 84.2 on MMLU, 89.5 on GSM8K, 64.6 on HumanEval and 51.1 on MATH, beating contemporaries such as Llama 3 70B and Mixtral 8x22B on the Qwen team's reported numbers, while Qwen2-72B-Instruct reaches 86.0 on HumanEval and 91.1 on GSM8K.

On licensing, Qwen2 split its terms: the 0.5B, 1.5B, 7B and 57B-A14B models are released under Apache 2.0 for broad commercial use, while the 72B model keeps the more restrictive Tongyi Qianwen License. Qwen2 was the open-weight base that the much wider Qwen2.5 generation (September 2024) and later Qwen3 built on, and although it has been superseded, the weights remain freely downloadable from Hugging Face and ModelScope.

Released	2024-06-07
License	Apache 2.0 (0.5B, 1.5B, 7B, 57B-A14B); Tongyi Qianwen License (72B)
Weights	Open weights
Parameters	5 models: 0.5B, 1.5B, 7B and 72B dense, plus a 57B-A14B MoE (14B activated, 64 routed + 8 shared experts)
Context	Up to 128K (131,072) tokens
Architecture	Transformer decoder with SwiGLU activation, attention QKV bias, RoPE, and Grouped Query Attention (GQA) across all sizes; the 57B-A14B variant is a Mixture-of-Experts (64 routed + 8 shared experts, 14B activated). Shared 151,643-token vocabulary; tied embeddings on the 0.5B/1.5B. Pretrained on up to 12T tokens.
Modalities	Text
Status	Superseded — replaced by Qwen2.5 (Sept 2024) and later Qwen3; weights remain available on Hugging Face.

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Strengths

Permissive Apache 2.0 license on all sizes except the 72B, enabling broad commercial and self-hosted use
Wide size range — from a 0.5B edge model to a 72B flagship and a 57B-A14B MoE — covering on-device through server deployments
GQA across every size for faster inference and lower KV-cache memory
Up to 128K-token context on the 7B-Instruct and 72B-Instruct variants
Strong multilingual coverage: English, Chinese and 27 additional languages
Competitive 2024 benchmark scores: 84.2 MMLU and 89.5 GSM8K on the 72B base model

Best for

Self-hosted, license-friendly LLM deployment via Apache 2.0 weights (sub-72B sizes)
On-device and edge inference with the 0.5B and 1.5B models
Long-context document processing using the 128K-context Instruct variants
Multilingual chat and assistant applications across 29 languages
Cost-efficient MoE serving with Qwen2-57B-A14B (14B activated parameters)
Coding and math tasks via Qwen2-Instruct and the companion Qwen2-Math models

How to access

Provider	Model ID
Hugging Face ↗	`Qwen/Qwen2-72B-Instruct`
Hugging Face ↗	`Qwen/Qwen2-7B-Instruct`
Hugging Face ↗	`Qwen/Qwen2-57B-A14B-Instruct`

Qwen (open-weight) — every version

The full lineage of the Qwen (open-weight) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Qwen3.6current	2026-04	—	Apache-2.0
Qwen3.5	2026-02-16	—	Apache-2.0
Qwen3 (2507 update)	2025-07	—	Apache-2.0
Qwen3	2025-04-28	—	Apache-2.0
Qwen2.5	2024-09	—	Apache-2.0
Qwen2	2024-06	—	Apache-2.0

FAQ

When was Qwen2 released and what sizes does it include?

Qwen2 was released on June 7, 2024. It includes five models: dense Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B and Qwen2-72B, plus a Mixture-of-Experts model, Qwen2-57B-A14B, which activates about 14B of its 57B parameters per token. Each ships with an instruction-tuned -Instruct variant.

What license does Qwen2 use?

It depends on the size. The 0.5B, 1.5B, 7B and 57B-A14B models use the permissive Apache 2.0 license, while the largest 72B model is released under the more restrictive Tongyi Qianwen License.

How long is Qwen2's context window?

Context length varies by model: 32K tokens for the 0.5B and 1.5B, 64K for the 57B-A14B-Instruct, and up to 128K (131,072) tokens for the 7B-Instruct and 72B-Instruct.

Is Qwen2 still the latest version?

No. Qwen2 was superseded by Qwen2.5 in September 2024 and later by Qwen3, which added more sizes, longer context and stronger coding and math performance. Qwen2's weights remain freely available on Hugging Face and ModelScope.

// Overview

// Benchmarks

// Strengths

// Best for

// How to access

// Qwen (open-weight) — every version

// FAQ