Overview
Qwen2.5 is the large language model family released by Alibaba Cloud's Qwen team on September 19, 2024. It shipped as a broad lineup of seven dense sizes — Qwen2.5-0.5B, 1.5B, 3B, 7B, 14B, 32B and 72B — each in a base (pretrained) and an instruction-tuned (Instruct) variant, alongside GGUF, AWQ and GPTQ quantized builds. The flagship Qwen2.5-72B-Instruct was positioned against open models like Llama-3.1-70B and Mistral-Large, while the smaller sizes targeted on-device and cost-sensitive deployment.
All Qwen2.5 models were pretrained on a dataset of up to 18 trillion tokens, a large jump from the 7 trillion used for Qwen2. Most models support a 128K-token (131,072) context window and can generate up to 8K tokens, with multilingual coverage spanning more than 29 languages including Chinese, English, French, Spanish, German, Russian, Japanese, Korean, Arabic and more. The release was accompanied by specialized siblings — Qwen2.5-Coder (1.5B/7B/32B) for programming and Qwen2.5-Math (1.5B/7B/72B) — plus hosted MoE variants Qwen2.5-Turbo and Qwen2.5-Plus on Alibaba Cloud Model Studio.
Licensing is mixed by size: all the open-weight models except the 3B and 72B are Apache 2.0, while the 3B and 72B use the custom Qwen License. Qwen2.5 has since been superseded by Qwen3, but its weights remain freely downloadable on Hugging Face and ModelScope and it stayed a popular fine-tuning and self-hosting base well after release. In January 2025 the team added Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M, extending the context window to one million tokens.
| Released | 2024-09 |
|---|---|
| License | Mixed: most sizes Apache 2.0; the 3B and 72B variants use the Qwen License (a custom, source-available license with attribution and a 100M-MAU commercial threshold). The 0.5B/1.5B/7B/14B/32B sizes are Apache 2.0. |
| Weights | Open weights |
| Parameters | Seven dense sizes: 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B (flagship Qwen2.5-72B is 72.7B total / 70.0B non-embedding) |
| Context | 128K tokens (131,072) for most models; the 7B and 14B "-1M" variants released January 2025 extend to 1M tokens |
| Max output | 8K tokens (8,192) |
| Architecture | Decoder-only Transformer with Grouped Query Attention (GQA), SwiGLU feed-forward activations, RMSNorm pre-normalization, Rotary Position Embeddings (RoPE), and QKV attention bias. The 72B model has 80 layers with 64 query heads and 8 key-value heads. Long context beyond 32K is enabled via YaRN scaling. Pretrained on up to 18 trillion tokens; multilingual support for 29+ languages. |
| Knowledge cutoff | Not officially published by Qwen; community reports cite October 2023 for the base models. Omitted as unverified. |
| Modalities | text |
| Status | Superseded by Qwen3 (released 2025); Qwen2.5 weights remain openly available on Hugging Face and ModelScope. |
Benchmarks
- MMLU (5-shot)86.1%
- MMLU-Pro71.1%
- GPQA49%
- MATH83.1%
- GSM8K95.8%
- HumanEval86.6%
- MBPP88.2%
- Arena-Hard81.2%
- LiveCodeBench (2305-2409)55.5%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Strengths
- Wide size ladder (0.5B to 72B) lets teams trade quality for cost and pick an on-device or server tier from one consistent family
- 128K-token context on most sizes, with 1M-token -1M variants for very long documents
- Strong math and coding for its era — Qwen2.5-72B-Instruct scores 83.1 on MATH and 86.6 on HumanEval
- Mostly Apache 2.0 licensing (all sizes except 3B and 72B) makes commercial self-hosting straightforward
- Broad multilingual support across 29+ languages
- Dedicated Coder and Math model lines built on the same base for specialized workloads
Best for
- Self-hosted chat assistants and RAG backends where open weights and data control matter
- Cost-efficient on-device or edge inference using the 0.5B-3B sizes
- Code generation and completion via the Qwen2.5-Coder line
- Math and quantitative reasoning via Qwen2.5-Math
- Fine-tuning a strong open base for domain-specific or multilingual tasks
- Long-document summarization and analysis using the 128K (or 1M) context variants
How to access
| Provider | Model ID |
|---|---|
| Alibaba Cloud Model Studio ↗ | qwen2.5-72b-instruct |
| Hugging Face ↗ | Qwen/Qwen2.5-72B-Instruct |
Qwen (open-weight) — every version
The full lineage of the Qwen (open-weight) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| Qwen3.6current | 2026-04 | — | Apache-2.0 |
| Qwen3.5 | 2026-02-16 | — | Apache-2.0 |
| Qwen3 (2507 update) | 2025-07 | — | Apache-2.0 |
| Qwen3 | 2025-04-28 | — | Apache-2.0 |
| Qwen2.5 | 2024-09 | — | Apache-2.0 |
| Qwen2 | 2024-06 | — | Apache-2.0 |
FAQ
Is Qwen2.5 open source and free to use?
The weights are openly downloadable, but licensing depends on size. All Qwen2.5 sizes are Apache 2.0 except the 3B and 72B, which use the custom Qwen License (source-available with attribution and a 100M-monthly-active-user commercial threshold). You can self-host any size from Hugging Face or ModelScope.
What sizes does Qwen2.5 come in?
Seven dense sizes: 0.5B, 1.5B, 3B, 7B, 14B, 32B and 72B, each in base and instruction-tuned variants. There are also specialized Qwen2.5-Coder (1.5B/7B/32B) and Qwen2.5-Math (1.5B/7B/72B) lines, plus hosted MoE models Qwen2.5-Turbo and Qwen2.5-Plus.
What is the context window of Qwen2.5?
Most Qwen2.5 models support a 128K-token (131,072) context window and can generate up to 8K tokens. In January 2025 the team released Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M, which extend the input context to one million tokens.
Has Qwen2.5 been replaced?
Yes — Qwen2.5 has been superseded by Qwen3, released in 2025. However, Qwen2.5 weights remain freely available and continue to be used as a fine-tuning and self-hosting base.