Overview
GLM-5 is the flagship large language model from Z.ai (Zhipu), released on 2026-02-11 as the first model in the GLM-5 generation. It is a Mixture-of-Experts model with roughly 744B total parameters and about 40B active per token, scaling up from the 355B-class GLM-4.5/4.6 line. GLM-5 ships with open weights under the MIT license on Hugging Face (zai-org/GLM-5) and ModelScope, in both BF16 and FP8.
Z.ai positioned GLM-5 around the theme "from vibe coding to agentic engineering," pushing past quick prototyping toward long-horizon, multi-step engineering work: planning, coding, testing and fixing across large codebases. The architecture uses DeepSeek Sparse Attention (DSA) to keep long-context serving affordable, supports a 200K-token context window, and can generate up to 128K tokens in a single response with a dedicated thinking mode, function calling and structured output.
GLM-5 was pre-trained on about 28.5T tokens and, notably, trained entirely on Huawei Ascend hardware for supply-chain independence. It reports strong results on reasoning, coding and agentic benchmarks among open-weight models, and is priced aggressively through the official Z.ai API ($1.00 per million input tokens, $3.20 per million output).
| Released | 2026-02-11 |
|---|---|
| License | MIT |
| Weights | Open weights |
| Parameters | 744B total / 40B active (MoE) |
| Context | 200K |
| Max output | 128K |
| Architecture | Mixture-of-Experts (internal type glm_moe_dsa): 256 routed experts plus 1 shared, 8 experts activated per token, 78 layers, 744B total parameters with ~40B active. Uses DeepSeek Sparse Attention (DSA) to cut long-context deployment cost. Pre-trained on ~28.5T tokens, trained fully on Huawei Ascend hardware. Released in BF16 and FP8. |
| Modalities | Text |
| Status | Released |
Benchmarks
- SWE-bench Verified77.8%
- AIME 202692.7%
- GPQA-Diamond86%
- BrowseComp62%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $1.00 / 1M tokens per 1M tokens |
|---|---|
| Cached input | $0.20 / 1M tokens per 1M tokens |
| Output | $3.20 / 1M tokens per 1M tokens |
Official Z.ai API pricing for model id glm-5. Open weights are also available under MIT for free self-hosting.
Strengths
- Open weights under the permissive MIT license — free to self-host, fine-tune and deploy commercially, with FP8 weights for cheaper inference
- Large MoE capacity (744B total / ~40B active) with DeepSeek Sparse Attention to keep long-context serving costs down
- Built for agentic, long-horizon engineering: planning, multi-file coding, testing and fixing rather than one-shot snippets
- 200K-token context with up to 128K-token output and a built-in thinking mode for extended reasoning
- Competitive open-model scores on reasoning and software-engineering benchmarks (SWE-bench Verified, AIME 2026, GPQA-Diamond)
- Aggressive official API pricing ($1.00 in / $3.20 out per million tokens, $0.20 cached input)
Best for
- Autonomous coding agents that plan, edit across many files, run tests and iterate on fixes
- Self-hosted or on-prem deployment where MIT-licensed open weights and data control matter
- Long-context analysis over large codebases, documents or logs up to ~200K tokens
- Math and science reasoning tasks (competition math, graduate-level QA)
- Tool-using and function-calling agents with structured JSON output
- Cost-sensitive production workloads that need frontier-class coding at a fraction of closed-model prices
How to access
| Provider | Model ID |
|---|---|
| Z.ai ↗ | glm-5 |
| OpenRouter ↗ | z-ai/glm-5 |
GLM (flagship) — every version
The full lineage of the GLM (flagship) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
FAQ
Is GLM-5 open source?
GLM-5 ships with open weights under the permissive MIT license. You can download the weights from Hugging Face (zai-org/GLM-5) or ModelScope in BF16 and FP8, and self-host, fine-tune and deploy them commercially without restriction. Note that open weights is not the same as fully open-source training data or code, but the MIT terms on the released model are very permissive.
How big is GLM-5 and what is its architecture?
GLM-5 is a Mixture-of-Experts model with about 744B total parameters and roughly 40B active per token. It uses 256 routed experts plus one shared expert (8 activated per token) across 78 layers, and adopts DeepSeek Sparse Attention (DSA) to keep long-context inference affordable. It was pre-trained on about 28.5T tokens and trained entirely on Huawei Ascend hardware.
What context window and output length does GLM-5 support?
GLM-5 supports a 200K-token context window (about 202,752 positions) and can generate up to 128K tokens (131,072) in a single response, with a thinking mode for extended reasoning. It is a text-only model.
How much does GLM-5 cost to use via the API?
Through the official Z.ai API, GLM-5 (model id glm-5) costs $1.00 per million input tokens and $3.20 per million output tokens, with cached input at $0.20 per million. Because the weights are MIT-licensed, you can also run it yourself for the cost of your own hardware.