Qwen2.5

Alibaba's open-weight model family spanning 0.5B to 72B with 128K context

Overview

Qwen2.5 is the large language model family released by Alibaba Cloud's Qwen team on September 19, 2024. It shipped as a broad lineup of seven dense sizes — Qwen2.5-0.5B, 1.5B, 3B, 7B, 14B, 32B and 72B — each in a base (pretrained) and an instruction-tuned (Instruct) variant, alongside GGUF, AWQ and GPTQ quantized builds. The flagship Qwen2.5-72B-Instruct was positioned against open models like Llama-3.1-70B and Mistral-Large, while the smaller sizes targeted on-device and cost-sensitive deployment.

All Qwen2.5 models were pretrained on a dataset of up to 18 trillion tokens, a large jump from the 7 trillion used for Qwen2. Most models support a 128K-token (131,072) context window and can generate up to 8K tokens, with multilingual coverage spanning more than 29 languages including Chinese, English, French, Spanish, German, Russian, Japanese, Korean, Arabic and more. The release was accompanied by specialized siblings — Qwen2.5-Coder (1.5B/7B/32B) for programming and Qwen2.5-Math (1.5B/7B/72B) — plus hosted MoE variants Qwen2.5-Turbo and Qwen2.5-Plus on Alibaba Cloud Model Studio.

Licensing is mixed by size: all the open-weight models except the 3B and 72B are Apache 2.0, while the 3B and 72B use the custom Qwen License. Qwen2.5 has since been superseded by Qwen3, but its weights remain freely downloadable on Hugging Face and ModelScope and it stayed a popular fine-tuning and self-hosting base well after release. In January 2025 the team added Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M, extending the context window to one million tokens.

Released	2024-09
License	Mixed: most sizes Apache 2.0; the 3B and 72B variants use the Qwen License (a custom, source-available license with attribution and a 100M-MAU commercial threshold). The 0.5B/1.5B/7B/14B/32B sizes are Apache 2.0.
Weights	Open weights
Parameters	Seven dense sizes: 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B (flagship Qwen2.5-72B is 72.7B total / 70.0B non-embedding)
Context	128K tokens (131,072) for most models; the 7B and 14B "-1M" variants released January 2025 extend to 1M tokens
Max output	8K tokens (8,192)
Architecture	Decoder-only Transformer with Grouped Query Attention (GQA), SwiGLU feed-forward activations, RMSNorm pre-normalization, Rotary Position Embeddings (RoPE), and QKV attention bias. The 72B model has 80 layers with 64 query heads and 8 key-value heads. Long context beyond 32K is enabled via YaRN scaling. Pretrained on up to 18 trillion tokens; multilingual support for 29+ languages.
Knowledge cutoff	Not officially published by Qwen; community reports cite October 2023 for the base models. Omitted as unverified.
Modalities	text
Status	Superseded by Qwen3 (released 2025); Qwen2.5 weights remain openly available on Hugging Face and ModelScope.

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Strengths

Wide size ladder (0.5B to 72B) lets teams trade quality for cost and pick an on-device or server tier from one consistent family
128K-token context on most sizes, with 1M-token -1M variants for very long documents
Strong math and coding for its era — Qwen2.5-72B-Instruct scores 83.1 on MATH and 86.6 on HumanEval
Mostly Apache 2.0 licensing (all sizes except 3B and 72B) makes commercial self-hosting straightforward
Broad multilingual support across 29+ languages
Dedicated Coder and Math model lines built on the same base for specialized workloads

Best for

Self-hosted chat assistants and RAG backends where open weights and data control matter
Cost-efficient on-device or edge inference using the 0.5B-3B sizes
Code generation and completion via the Qwen2.5-Coder line
Math and quantitative reasoning via Qwen2.5-Math
Fine-tuning a strong open base for domain-specific or multilingual tasks
Long-document summarization and analysis using the 128K (or 1M) context variants

How to access

Provider	Model ID
Alibaba Cloud Model Studio ↗	`qwen2.5-72b-instruct`
Hugging Face ↗	`Qwen/Qwen2.5-72B-Instruct`

Qwen (open-weight) — every version

The full lineage of the Qwen (open-weight) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Qwen3.6current	2026-04	—	Apache-2.0
Qwen3.5	2026-02-16	—	Apache-2.0
Qwen3 (2507 update)	2025-07	—	Apache-2.0
Qwen3	2025-04-28	—	Apache-2.0
Qwen2.5	2024-09	—	Apache-2.0
Qwen2	2024-06	—	Apache-2.0

FAQ

Is Qwen2.5 open source and free to use?

The weights are openly downloadable, but licensing depends on size. All Qwen2.5 sizes are Apache 2.0 except the 3B and 72B, which use the custom Qwen License (source-available with attribution and a 100M-monthly-active-user commercial threshold). You can self-host any size from Hugging Face or ModelScope.

What sizes does Qwen2.5 come in?

Seven dense sizes: 0.5B, 1.5B, 3B, 7B, 14B, 32B and 72B, each in base and instruction-tuned variants. There are also specialized Qwen2.5-Coder (1.5B/7B/32B) and Qwen2.5-Math (1.5B/7B/72B) lines, plus hosted MoE models Qwen2.5-Turbo and Qwen2.5-Plus.

What is the context window of Qwen2.5?

Most Qwen2.5 models support a 128K-token (131,072) context window and can generate up to 8K tokens. In January 2025 the team released Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M, which extend the input context to one million tokens.

Has Qwen2.5 been replaced?

Yes — Qwen2.5 has been superseded by Qwen3, released in 2025. However, Qwen2.5 weights remain freely available and continue to be used as a fine-tuning and self-hosting base.

// Overview

// Benchmarks

// Strengths

// Best for

// How to access

// Qwen (open-weight) — every version

// FAQ