AI/TLDR

Qwen2.5-Coder

Alibaba's open-weight code LLM family (0.5B–32B); the 32B matched GPT-4o on coding

Overview

Qwen2.5-Coder is the code-specialized LLM family from Alibaba's Qwen team, released on November 12, 2024. It spans six dense model sizes — 0.5B, 1.5B, 3B, 7B, 14B, and 32B — each available in base and instruction-tuned (Instruct) variants, so the same architecture covers everything from on-device autocomplete to a server-grade coding assistant. All sizes are open weights under Apache 2.0, except the 3B model, which is released under the Qwen Research license.

The models build on Qwen2.5 and were continued-pretrained on 5.5 trillion tokens of source code, text-code grounding data, synthetic data, math, and general text. The flagship Qwen2.5-Coder-32B-Instruct was positioned by Alibaba as a state-of-the-art open coding model of its generation, with code-generation ability comparable to GPT-4o: it scored 92.7% on HumanEval and led open models on EvalPlus, LiveCodeBench, and BigCodeBench. The 7B and 14B variants offer strong, cheaper alternatives, while the 0.5B–3B models target latency-sensitive and local use.

Qwen2.5-Coder supports up to 128K-token context (131,072 tokens, via YaRN) on the 7B/14B/32B models and 32K on the smaller sizes, plus fill-in-the-middle infilling for editor-style completion. Weights are distributed on Hugging Face, ModelScope, and Ollama, and the models are reachable through hosted APIs such as OpenRouter. Alibaba's Qwen-Coder line later continued with Qwen3-Coder in July 2025, but Qwen2.5-Coder remains a popular, freely self-hostable option.

Released2024-11-12
LicenseApache 2.0 (all sizes except the 3B model, which uses the Qwen Research license)
WeightsOpen weights
ParametersFamily of 6 sizes: 0.5B, 1.5B, 3B, 7B, 14B, 32B (flagship 32B = 32.5B total / 31.0B non-embedding)
Context128K (131,072 tokens) for 7B/14B/32B; 32K for 0.5B/1.5B/3B
Max output8,192 tokens (default generation); hosted endpoints commonly cap output at 32,768
ArchitectureDense, decoder-only causal language model based on the Qwen2.5 architecture, using RoPE positional embeddings, SwiGLU activations, RMSNorm, and grouped-query attention with QKV bias. The flagship Qwen2.5-Coder-32B has 64 layers, 40 query heads and 8 key-value heads (GQA), and a hidden size of 5,120 over a ~151.6K-token vocabulary. Continued pre-training on 5.5 trillion tokens (roughly a 70:20:10 mix of code, text, and math), with file-level training at 8,192 tokens, repo-level training at 32,768 tokens, and extension to 128K (131,072 tokens) via YaRN. Supports fill-in-the-middle (FIM) infilling.
Knowledge cutoffNot officially published for Qwen2.5-Coder; omitted to avoid guessing
ModalitiesText
StatusAvailable (open weights). Superseded by Qwen3-Coder (July 2025) as Alibaba's flagship coding line, but Qwen2.5-Coder weights remain downloadable and widely used.

Benchmarks

  1. HumanEval (32B-Instruct, Pass@1)92.7%
  2. HumanEval+ / EvalPlus (32B-Instruct)87.2%
  3. MBPP (32B-Instruct, Pass@1)90.2%
  4. MBPP+ / EvalPlus (32B-Instruct)75.1%
  5. BigCodeBench-Instruct Full (32B-Instruct)49.6%
  6. LiveCodeBench (32B-Instruct, Pass@1)31.4%
  7. Aider code editing (32B-Instruct, Pass@2)73.7%
  8. HumanEval (7B-Instruct, Pass@1)88.4%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.66 / 1M tokens per 1M tokens
Output$1.00 / 1M tokens per 1M tokens

Qwen2.5-Coder is open weights and free to self-host. Rates shown are for the hosted Qwen2.5-Coder-32B-Instruct endpoint on OpenRouter; other providers and the smaller sizes vary. Alibaba's own Model Studio now lists Qwen3-Coder rather than Qwen2.5-Coder.

Pricing source ↗

Strengths

  • Open weights under Apache 2.0 (except the 3B, which uses the Qwen Research license) — self-hostable and commercial-friendly
  • Six sizes from 0.5B to 32B let teams trade off quality, cost, and latency on the same architecture
  • Flagship 32B-Instruct rivaled GPT-4o on code generation (92.7% HumanEval) and led open models on EvalPlus, LiveCodeBench, and BigCodeBench at release
  • 128K-token context on the 7B/14B/32B models for repository-scale code understanding
  • Fill-in-the-middle (FIM) infilling makes it well suited to IDE/editor autocomplete, not just chat
  • Broad multilingual code coverage (40+ programming languages) and strong code-repair/editing performance
  • Widely distributed and easy to run locally via Ollama, Hugging Face, and ModelScope

Best for

  • Self-hosted code assistants and chat-based coding helpers where open weights matter
  • IDE/editor autocomplete and fill-in-the-middle completion (e.g. via the smaller 0.5B–7B models for low latency)
  • Code generation, debugging, and refactoring across many programming languages
  • Code repair and editing workflows (the 32B is competitive on Aider-style edit benchmarks)
  • Repository-scale code analysis using the 128K-token context on the 7B/14B/32B variants
  • Text-to-SQL and data/code tasks for cost-sensitive, high-volume workloads
  • On-device or edge deployment using the 0.5B/1.5B/3B models

How to access

ProviderModel ID
OpenRouter ↗qwen/qwen-2.5-coder-32b-instruct
Ollama ↗qwen2.5-coder
Hugging Face ↗Qwen/Qwen2.5-Coder-32B-Instruct

Qwen-Coder — every version

The full lineage of the Qwen-Coder line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Qwen3-Codercurrent2025-07-22Apache-2.0
Qwen2.5-Coder2024-11Open weights

FAQ

Is Qwen2.5-Coder open source / free to use?

Yes. Qwen2.5-Coder is open weights and free to download and self-host. All sizes are released under Apache 2.0 except the 3B model, which uses the more restrictive Qwen Research license. Weights are on Hugging Face, ModelScope, and Ollama.

What model sizes does Qwen2.5-Coder come in?

Six dense sizes: 0.5B, 1.5B, 3B, 7B, 14B, and 32B, each with base and instruction-tuned (Instruct) variants. The 32B (Qwen2.5-Coder-32B-Instruct) is the flagship; the smaller sizes target local, edge, and low-latency use.

How does Qwen2.5-Coder-32B compare to GPT-4o?

At release in November 2024, Alibaba positioned Qwen2.5-Coder-32B-Instruct as a state-of-the-art open coding model with code-generation ability comparable to GPT-4o. It scored 92.7% on HumanEval and led open models on EvalPlus, LiveCodeBench, and BigCodeBench.

Has Qwen2.5-Coder been replaced?

It has been superseded as Alibaba's flagship coding model by Qwen3-Coder (released July 2025), and Alibaba's Model Studio now lists Qwen3-Coder rather than Qwen2.5-Coder. However, Qwen2.5-Coder weights remain freely available and widely used for self-hosting and via providers like OpenRouter and Ollama.