Qwen2.5-Coder

Alibaba's open-weight code LLM family (0.5B–32B); the 32B matched GPT-4o on coding

Overview

Qwen2.5-Coder is the code-specialized LLM family from Alibaba's Qwen team, released on November 12, 2024. It spans six dense model sizes — 0.5B, 1.5B, 3B, 7B, 14B, and 32B — each available in base and instruction-tuned (Instruct) variants, so the same architecture covers everything from on-device autocomplete to a server-grade coding assistant. All sizes are open weights under Apache 2.0, except the 3B model, which is released under the Qwen Research license.

The models build on Qwen2.5 and were continued-pretrained on 5.5 trillion tokens of source code, text-code grounding data, synthetic data, math, and general text. The flagship Qwen2.5-Coder-32B-Instruct was positioned by Alibaba as a state-of-the-art open coding model of its generation, with code-generation ability comparable to GPT-4o: it scored 92.7% on HumanEval and led open models on EvalPlus, LiveCodeBench, and BigCodeBench. The 7B and 14B variants offer strong, cheaper alternatives, while the 0.5B–3B models target latency-sensitive and local use.

Qwen2.5-Coder supports up to 128K-token context (131,072 tokens, via YaRN) on the 7B/14B/32B models and 32K on the smaller sizes, plus fill-in-the-middle infilling for editor-style completion. Weights are distributed on Hugging Face, ModelScope, and Ollama, and the models are reachable through hosted APIs such as OpenRouter. Alibaba's Qwen-Coder line later continued with Qwen3-Coder in July 2025, but Qwen2.5-Coder remains a popular, freely self-hostable option.

Released	2024-11-12
License	Apache 2.0 (all sizes except the 3B model, which uses the Qwen Research license)
Weights	Open weights
Parameters	Family of 6 sizes: 0.5B, 1.5B, 3B, 7B, 14B, 32B (flagship 32B = 32.5B total / 31.0B non-embedding)
Context	128K (131,072 tokens) for 7B/14B/32B; 32K for 0.5B/1.5B/3B
Max output	8,192 tokens (default generation); hosted endpoints commonly cap output at 32,768
Architecture	Dense, decoder-only causal language model based on the Qwen2.5 architecture, using RoPE positional embeddings, SwiGLU activations, RMSNorm, and grouped-query attention with QKV bias. The flagship Qwen2.5-Coder-32B has 64 layers, 40 query heads and 8 key-value heads (GQA), and a hidden size of 5,120 over a ~151.6K-token vocabulary. Continued pre-training on 5.5 trillion tokens (roughly a 70:20:10 mix of code, text, and math), with file-level training at 8,192 tokens, repo-level training at 32,768 tokens, and extension to 128K (131,072 tokens) via YaRN. Supports fill-in-the-middle (FIM) infilling.
Knowledge cutoff	Not officially published for Qwen2.5-Coder; omitted to avoid guessing
Modalities	Text
Status	Available (open weights). Superseded by Qwen3-Coder (July 2025) as Alibaba's flagship coding line, but Qwen2.5-Coder weights remain downloadable and widely used.

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.66 / 1M tokens per 1M tokens
Output	$1.00 / 1M tokens per 1M tokens

Qwen2.5-Coder is open weights and free to self-host. Rates shown are for the hosted Qwen2.5-Coder-32B-Instruct endpoint on OpenRouter; other providers and the smaller sizes vary. Alibaba's own Model Studio now lists Qwen3-Coder rather than Qwen2.5-Coder.

Pricing source ↗

Strengths

Open weights under Apache 2.0 (except the 3B, which uses the Qwen Research license) — self-hostable and commercial-friendly
Six sizes from 0.5B to 32B let teams trade off quality, cost, and latency on the same architecture
Flagship 32B-Instruct rivaled GPT-4o on code generation (92.7% HumanEval) and led open models on EvalPlus, LiveCodeBench, and BigCodeBench at release
128K-token context on the 7B/14B/32B models for repository-scale code understanding
Fill-in-the-middle (FIM) infilling makes it well suited to IDE/editor autocomplete, not just chat
Broad multilingual code coverage (40+ programming languages) and strong code-repair/editing performance
Widely distributed and easy to run locally via Ollama, Hugging Face, and ModelScope

Best for

Self-hosted code assistants and chat-based coding helpers where open weights matter
IDE/editor autocomplete and fill-in-the-middle completion (e.g. via the smaller 0.5B–7B models for low latency)
Code generation, debugging, and refactoring across many programming languages
Code repair and editing workflows (the 32B is competitive on Aider-style edit benchmarks)
Repository-scale code analysis using the 128K-token context on the 7B/14B/32B variants
Text-to-SQL and data/code tasks for cost-sensitive, high-volume workloads
On-device or edge deployment using the 0.5B/1.5B/3B models

How to access

Provider	Model ID
OpenRouter ↗	`qwen/qwen-2.5-coder-32b-instruct`
Ollama ↗	`qwen2.5-coder`
Hugging Face ↗	`Qwen/Qwen2.5-Coder-32B-Instruct`

Qwen-Coder — every version

The full lineage of the Qwen-Coder line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Qwen3-Codercurrent	2025-07-22	—	Apache-2.0
Qwen2.5-Coder	2024-11	—	Open weights

FAQ

Is Qwen2.5-Coder open source / free to use?

Yes. Qwen2.5-Coder is open weights and free to download and self-host. All sizes are released under Apache 2.0 except the 3B model, which uses the more restrictive Qwen Research license. Weights are on Hugging Face, ModelScope, and Ollama.

What model sizes does Qwen2.5-Coder come in?

Six dense sizes: 0.5B, 1.5B, 3B, 7B, 14B, and 32B, each with base and instruction-tuned (Instruct) variants. The 32B (Qwen2.5-Coder-32B-Instruct) is the flagship; the smaller sizes target local, edge, and low-latency use.

How does Qwen2.5-Coder-32B compare to GPT-4o?

At release in November 2024, Alibaba positioned Qwen2.5-Coder-32B-Instruct as a state-of-the-art open coding model with code-generation ability comparable to GPT-4o. It scored 92.7% on HumanEval and led open models on EvalPlus, LiveCodeBench, and BigCodeBench.

Has Qwen2.5-Coder been replaced?

It has been superseded as Alibaba's flagship coding model by Qwen3-Coder (released July 2025), and Alibaba's Model Studio now lists Qwen3-Coder rather than Qwen2.5-Coder. However, Qwen2.5-Coder weights remain freely available and widely used for self-hosting and via providers like OpenRouter and Ollama.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// Qwen-Coder — every version

// FAQ