Overview
GLM-5.1 is the flagship large language model from Z.ai (formerly Zhipu AI), released on April 7, 2026 and open-sourced under the MIT license the following day. It is a ~754B-parameter Mixture-of-Experts model with roughly 40B parameters active per token, a 200K-token context window and up to 128K tokens of output. GLM-5.1 is an iteration on GLM-5 that keeps the same architecture and instead sharpens coding and agentic behaviour through refined reinforcement learning and alignment.
GLM-5.1 is built for long-horizon agentic engineering. Z.ai positions it to work autonomously on a single task for up to roughly 8 hours — planning, executing and iteratively improving its own output — and demonstrated it building a complete Linux desktop environment from scratch across hundreds of iterations. On SWE-Bench Pro it scores 58.4, which Z.ai reports as ahead of GPT-5.4 and Claude Opus 4.6, making GLM-5.1 one of the strongest open-weight coding and agent models at release.
Because the weights ship openly (BF16 and native FP8 checkpoints on Hugging Face under the MIT license), GLM-5.1 can be self-hosted via SGLang, vLLM, Transformers and similar runtimes, or used through Z.ai's hosted API and aggregators such as OpenRouter (model id z-ai/glm-5.1). Note that some Z.ai-reported April 7 benchmark figures were still awaiting full independent verification at release.
| Released | 2026-04-07 |
|---|---|
| License | MIT |
| Weights | Open weights |
| Parameters | ~754B total, 40B active (MoE) |
| Context | 200K |
| Max output | 128K |
| Architecture | Mixture-of-Experts transformer (~754B total parameters, ~40B active per token) with 256 routed experts (top-8 routing) plus 1 shared expert, Multi-head Latent Attention (MLA) combined with DeepSeek Sparse Attention (DSA), and a Multi-Token Prediction (MTP) head. Both BF16 and native FP8 checkpoints are published; thinking mode is enabled by default. GLM-5.1 keeps GLM-5's architecture and improves coding and agentic behaviour through refined reinforcement learning and alignment rather than additional pre-training. |
| Modalities | Text |
| Status | Available |
Benchmarks
- SWE-Bench Pro58.4%
- AIME 202695.3%
- HMMT Feb. 202682.6%
- GPQA-Diamond86.2%
- BrowseComp68%
- MCP-Atlas (Public Set)71.8%
- τ³-Bench70.6%
- Terminal-Bench 2.063.5%
- CyberGym68.7%
- Artificial Analysis Intelligence Index40index
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $1.40 / 1M tokens per 1M tokens |
|---|---|
| Cached input | $0.26 / 1M tokens per 1M tokens |
| Output | $4.40 / 1M tokens per 1M tokens |
Official Z.ai API pricing. Also available via OpenRouter (model id z-ai/glm-5.1) at roughly $0.98 input / $3.08 output per 1M tokens. Open weights (MIT) can be self-hosted at infrastructure cost.
Strengths
- State-of-the-art coding on SWE-Bench Pro (58.4) at release, reported ahead of GPT-5.4 and Claude Opus 4.6
- Long-horizon autonomous execution — sustains a single task for up to ~8 hours across hundreds of iterations
- Open weights under the permissive MIT license, with both BF16 and native FP8 checkpoints for self-hosting
- Strong agentic and tool-use scores (BrowseComp 68.0, MCP-Atlas 71.8, τ³-Bench 70.6)
- Strong math and reasoning (AIME 2026 95.3, GPQA-Diamond 86.2) with thinking mode on by default
- 200K context with up to 128K output, suited to large codebases and long agent traces
Best for
- Long-running autonomous coding agents that plan, execute and self-correct over many iterations
- Software engineering tasks: bug fixing, refactoring and repository-level changes (SWE-Bench-style work)
- Agentic workflows with tool calling, browsing and MCP-based tool orchestration
- Self-hosted deployment where open weights and a permissive MIT license are required
- Math and STEM reasoning with extended thinking
- Cost-sensitive frontier coding via open weights or competitively priced hosted APIs
How to access
| Provider | Model ID |
|---|---|
| Z.ai ↗ | glm-5.1 |
| OpenRouter ↗ | z-ai/glm-5.1 |
| Hugging Face (weights) ↗ | zai-org/GLM-5.1 |
GLM (flagship) — every version
The full lineage of the GLM (flagship) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
FAQ
Is GLM-5.1 open source?
Yes. Z.ai released GLM-5.1's weights on Hugging Face (zai-org/GLM-5.1) under the permissive MIT license, including both BF16 and native FP8 checkpoints. The MIT license allows you to download, inspect, modify, fine-tune and use the model commercially without restriction.
How big is GLM-5.1 and what is its context window?
GLM-5.1 is a Mixture-of-Experts model with roughly 754B total parameters and about 40B active per token. It has a 200K-token context window and can generate up to 128K output tokens. (Some trackers list the total as 744B, carried over from GLM-5; the official model card states ~754B.)
How does GLM-5.1 compare to GPT-5.4 and Claude Opus 4.6 on coding?
On SWE-Bench Pro, Z.ai reports GLM-5.1 scoring 58.4, ahead of GPT-5.4 (57.7) and Claude Opus 4.6 (57.3) — making it state-of-the-art on that benchmark at release. Note that several April 7 figures were self-reported by Z.ai and were still pending full independent verification.
What is GLM-5.1 best at?
Long-horizon agentic engineering. Z.ai positions it to work autonomously on a single task for up to about 8 hours — planning, executing and iterating — and it posts strong coding, agentic tool-use (BrowseComp, MCP-Atlas, τ³-Bench) and math/reasoning (AIME 2026, GPQA-Diamond) scores.