GLM-5.1

Z.ai's open-weight 754B MoE flagship — SOTA on SWE-Bench Pro and up to ~8 hours of autonomous agentic execution.

Overview

GLM-5.1 is the flagship large language model from Z.ai (formerly Zhipu AI), released on April 7, 2026 and open-sourced under the MIT license the following day. It is a ~754B-parameter Mixture-of-Experts model with roughly 40B parameters active per token, a 200K-token context window and up to 128K tokens of output. GLM-5.1 is an iteration on GLM-5 that keeps the same architecture and instead sharpens coding and agentic behaviour through refined reinforcement learning and alignment.

GLM-5.1 is built for long-horizon agentic engineering. Z.ai positions it to work autonomously on a single task for up to roughly 8 hours — planning, executing and iteratively improving its own output — and demonstrated it building a complete Linux desktop environment from scratch across hundreds of iterations. On SWE-Bench Pro it scores 58.4, which Z.ai reports as ahead of GPT-5.4 and Claude Opus 4.6, making GLM-5.1 one of the strongest open-weight coding and agent models at release.

Because the weights ship openly (BF16 and native FP8 checkpoints on Hugging Face under the MIT license), GLM-5.1 can be self-hosted via SGLang, vLLM, Transformers and similar runtimes, or used through Z.ai's hosted API and aggregators such as OpenRouter (model id z-ai/glm-5.1). Note that some Z.ai-reported April 7 benchmark figures were still awaiting full independent verification at release.

Released	2026-04-07
License	MIT
Weights	Open weights
Parameters	~754B total, 40B active (MoE)
Context	200K
Max output	128K
Architecture	Mixture-of-Experts transformer (~754B total parameters, ~40B active per token) with 256 routed experts (top-8 routing) plus 1 shared expert, Multi-head Latent Attention (MLA) combined with DeepSeek Sparse Attention (DSA), and a Multi-Token Prediction (MTP) head. Both BF16 and native FP8 checkpoints are published; thinking mode is enabled by default. GLM-5.1 keeps GLM-5's architecture and improves coding and agentic behaviour through refined reinforcement learning and alignment rather than additional pre-training.
Modalities	Text
Status	Available

Benchmarks

SWE-Bench Pro58.4%
AIME 202695.3%
HMMT Feb. 202682.6%
GPQA-Diamond86.2%
BrowseComp68%
MCP-Atlas (Public Set)71.8%
τ³-Bench70.6%
Terminal-Bench 2.063.5%
CyberGym68.7%
Artificial Analysis Intelligence Index40index

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$1.40 / 1M tokens per 1M tokens
Cached input	$0.26 / 1M tokens per 1M tokens
Output	$4.40 / 1M tokens per 1M tokens

Official Z.ai API pricing. Also available via OpenRouter (model id z-ai/glm-5.1) at roughly $0.98 input / $3.08 output per 1M tokens. Open weights (MIT) can be self-hosted at infrastructure cost.

Pricing source ↗

Strengths

State-of-the-art coding on SWE-Bench Pro (58.4) at release, reported ahead of GPT-5.4 and Claude Opus 4.6
Long-horizon autonomous execution — sustains a single task for up to ~8 hours across hundreds of iterations
Open weights under the permissive MIT license, with both BF16 and native FP8 checkpoints for self-hosting
Strong agentic and tool-use scores (BrowseComp 68.0, MCP-Atlas 71.8, τ³-Bench 70.6)
Strong math and reasoning (AIME 2026 95.3, GPQA-Diamond 86.2) with thinking mode on by default
200K context with up to 128K output, suited to large codebases and long agent traces

Best for

Long-running autonomous coding agents that plan, execute and self-correct over many iterations
Software engineering tasks: bug fixing, refactoring and repository-level changes (SWE-Bench-style work)
Agentic workflows with tool calling, browsing and MCP-based tool orchestration
Self-hosted deployment where open weights and a permissive MIT license are required
Math and STEM reasoning with extended thinking
Cost-sensitive frontier coding via open weights or competitively priced hosted APIs

How to access

Provider	Model ID
Z.ai ↗	`glm-5.1`
OpenRouter ↗	`z-ai/glm-5.1`
Hugging Face (weights) ↗	`zai-org/GLM-5.1`

GLM (flagship) — every version

The full lineage of the GLM (flagship) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
GLM-5.2current	2026-06-13	1M	MIT
GLM-5.1	2026-04-07	—	Open weights
GLM-5	2026-02-11	—	Apache-2.0
GLM-4.7	2025-12-22	—	Open weights
GLM-4.6	2025-09-30	—	MIT
GLM-4.5	2025-07-28	—	MIT

FAQ

Is GLM-5.1 open source?

Yes. Z.ai released GLM-5.1's weights on Hugging Face (zai-org/GLM-5.1) under the permissive MIT license, including both BF16 and native FP8 checkpoints. The MIT license allows you to download, inspect, modify, fine-tune and use the model commercially without restriction.

How big is GLM-5.1 and what is its context window?

GLM-5.1 is a Mixture-of-Experts model with roughly 754B total parameters and about 40B active per token. It has a 200K-token context window and can generate up to 128K output tokens. (Some trackers list the total as 744B, carried over from GLM-5; the official model card states ~754B.)

How does GLM-5.1 compare to GPT-5.4 and Claude Opus 4.6 on coding?

On SWE-Bench Pro, Z.ai reports GLM-5.1 scoring 58.4, ahead of GPT-5.4 (57.7) and Claude Opus 4.6 (57.3) — making it state-of-the-art on that benchmark at release. Note that several April 7 figures were self-reported by Z.ai and were still pending full independent verification.

What is GLM-5.1 best at?

Long-horizon agentic engineering. Z.ai positions it to work autonomously on a single task for up to about 8 hours — planning, executing and iterating — and it posts strong coding, agentic tool-use (BrowseComp, MCP-Atlas, τ³-Bench) and math/reasoning (AIME 2026, GPQA-Diamond) scores.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// GLM (flagship) — every version

// FAQ