AI/TLDR

Qwen2

Alibaba's June 2024 open-weight family: 0.5B–72B dense plus a 57B-A14B MoE, GQA throughout and up to 128K context.

Overview

Qwen2 is the open-weight large language model family Alibaba's Qwen team released on June 7, 2024, succeeding Qwen1.5. It ships in five sizes: dense Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B and Qwen2-72B, plus a Mixture-of-Experts model, Qwen2-57B-A14B, that activates roughly 14B of its 57B parameters per token via 64 routed and 8 shared experts. Every size uses Grouped Query Attention (GQA), and the family shares a 151,643-token multilingual vocabulary covering English, Chinese and 27 additional languages.

Each base model ships alongside an instruction-tuned -Instruct variant. Context length varies by size: 32K tokens for the 0.5B and 1.5B, 64K for the 57B-A14B-Instruct, and up to 128K (131,072) tokens for the 7B-Instruct and 72B-Instruct. The flagship Qwen2-72B base model posts 84.2 on MMLU, 89.5 on GSM8K, 64.6 on HumanEval and 51.1 on MATH, beating contemporaries such as Llama 3 70B and Mixtral 8x22B on the Qwen team's reported numbers, while Qwen2-72B-Instruct reaches 86.0 on HumanEval and 91.1 on GSM8K.

On licensing, Qwen2 split its terms: the 0.5B, 1.5B, 7B and 57B-A14B models are released under Apache 2.0 for broad commercial use, while the 72B model keeps the more restrictive Tongyi Qianwen License. Qwen2 was the open-weight base that the much wider Qwen2.5 generation (September 2024) and later Qwen3 built on, and although it has been superseded, the weights remain freely downloadable from Hugging Face and ModelScope.

Released2024-06-07
LicenseApache 2.0 (0.5B, 1.5B, 7B, 57B-A14B); Tongyi Qianwen License (72B)
WeightsOpen weights
Parameters5 models: 0.5B, 1.5B, 7B and 72B dense, plus a 57B-A14B MoE (14B activated, 64 routed + 8 shared experts)
ContextUp to 128K (131,072) tokens
ArchitectureTransformer decoder with SwiGLU activation, attention QKV bias, RoPE, and Grouped Query Attention (GQA) across all sizes; the 57B-A14B variant is a Mixture-of-Experts (64 routed + 8 shared experts, 14B activated). Shared 151,643-token vocabulary; tied embeddings on the 0.5B/1.5B. Pretrained on up to 12T tokens.
ModalitiesText
StatusSuperseded — replaced by Qwen2.5 (Sept 2024) and later Qwen3; weights remain available on Hugging Face.

Benchmarks

  1. MMLU (5-shot, base 72B)84.2%
  2. MMLU-Pro (base 72B)55.6%
  3. GPQA (base 72B)37.9%
  4. HumanEval (base 72B)64.6%
  5. MBPP (base 72B)76.9%
  6. GSM8K (base 72B)89.5%
  7. MATH (base 72B)51.1%
  8. HumanEval (72B-Instruct)86%
  9. GSM8K (72B-Instruct)91.1%
  10. MATH (72B-Instruct)59.7%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Strengths

  • Permissive Apache 2.0 license on all sizes except the 72B, enabling broad commercial and self-hosted use
  • Wide size range — from a 0.5B edge model to a 72B flagship and a 57B-A14B MoE — covering on-device through server deployments
  • GQA across every size for faster inference and lower KV-cache memory
  • Up to 128K-token context on the 7B-Instruct and 72B-Instruct variants
  • Strong multilingual coverage: English, Chinese and 27 additional languages
  • Competitive 2024 benchmark scores: 84.2 MMLU and 89.5 GSM8K on the 72B base model

Best for

  • Self-hosted, license-friendly LLM deployment via Apache 2.0 weights (sub-72B sizes)
  • On-device and edge inference with the 0.5B and 1.5B models
  • Long-context document processing using the 128K-context Instruct variants
  • Multilingual chat and assistant applications across 29 languages
  • Cost-efficient MoE serving with Qwen2-57B-A14B (14B activated parameters)
  • Coding and math tasks via Qwen2-Instruct and the companion Qwen2-Math models

How to access

ProviderModel ID
Hugging Face ↗Qwen/Qwen2-72B-Instruct
Hugging Face ↗Qwen/Qwen2-7B-Instruct
Hugging Face ↗Qwen/Qwen2-57B-A14B-Instruct

Qwen (open-weight) — every version

The full lineage of the Qwen (open-weight) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Qwen3.6current2026-04Apache-2.0
Qwen3.52026-02-16Apache-2.0
Qwen3 (2507 update)2025-07Apache-2.0
Qwen32025-04-28Apache-2.0
Qwen2.52024-09Apache-2.0
Qwen22024-06Apache-2.0

FAQ

When was Qwen2 released and what sizes does it include?

Qwen2 was released on June 7, 2024. It includes five models: dense Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B and Qwen2-72B, plus a Mixture-of-Experts model, Qwen2-57B-A14B, which activates about 14B of its 57B parameters per token. Each ships with an instruction-tuned -Instruct variant.

What license does Qwen2 use?

It depends on the size. The 0.5B, 1.5B, 7B and 57B-A14B models use the permissive Apache 2.0 license, while the largest 72B model is released under the more restrictive Tongyi Qianwen License.

How long is Qwen2's context window?

Context length varies by model: 32K tokens for the 0.5B and 1.5B, 64K for the 57B-A14B-Instruct, and up to 128K (131,072) tokens for the 7B-Instruct and 72B-Instruct.

Is Qwen2 still the latest version?

No. Qwen2 was superseded by Qwen2.5 in September 2024 and later by Qwen3, which added more sizes, longer context and stronger coding and math performance. Qwen2's weights remain freely available on Hugging Face and ModelScope.