AI/TLDR

GLM-5.1

Z.ai's open-weight 754B MoE flagship — SOTA on SWE-Bench Pro and up to ~8 hours of autonomous agentic execution.

Overview

GLM-5.1 is the flagship large language model from Z.ai (formerly Zhipu AI), released on April 7, 2026 and open-sourced under the MIT license the following day. It is a ~754B-parameter Mixture-of-Experts model with roughly 40B parameters active per token, a 200K-token context window and up to 128K tokens of output. GLM-5.1 is an iteration on GLM-5 that keeps the same architecture and instead sharpens coding and agentic behaviour through refined reinforcement learning and alignment.

GLM-5.1 is built for long-horizon agentic engineering. Z.ai positions it to work autonomously on a single task for up to roughly 8 hours — planning, executing and iteratively improving its own output — and demonstrated it building a complete Linux desktop environment from scratch across hundreds of iterations. On SWE-Bench Pro it scores 58.4, which Z.ai reports as ahead of GPT-5.4 and Claude Opus 4.6, making GLM-5.1 one of the strongest open-weight coding and agent models at release.

Because the weights ship openly (BF16 and native FP8 checkpoints on Hugging Face under the MIT license), GLM-5.1 can be self-hosted via SGLang, vLLM, Transformers and similar runtimes, or used through Z.ai's hosted API and aggregators such as OpenRouter (model id z-ai/glm-5.1). Note that some Z.ai-reported April 7 benchmark figures were still awaiting full independent verification at release.

Released2026-04-07
LicenseMIT
WeightsOpen weights
Parameters~754B total, 40B active (MoE)
Context200K
Max output128K
ArchitectureMixture-of-Experts transformer (~754B total parameters, ~40B active per token) with 256 routed experts (top-8 routing) plus 1 shared expert, Multi-head Latent Attention (MLA) combined with DeepSeek Sparse Attention (DSA), and a Multi-Token Prediction (MTP) head. Both BF16 and native FP8 checkpoints are published; thinking mode is enabled by default. GLM-5.1 keeps GLM-5's architecture and improves coding and agentic behaviour through refined reinforcement learning and alignment rather than additional pre-training.
ModalitiesText
StatusAvailable

Benchmarks

  1. SWE-Bench Pro58.4%
  2. AIME 202695.3%
  3. HMMT Feb. 202682.6%
  4. GPQA-Diamond86.2%
  5. BrowseComp68%
  6. MCP-Atlas (Public Set)71.8%
  7. τ³-Bench70.6%
  8. Terminal-Bench 2.063.5%
  9. CyberGym68.7%
  10. Artificial Analysis Intelligence Index40index

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$1.40 / 1M tokens per 1M tokens
Cached input$0.26 / 1M tokens per 1M tokens
Output$4.40 / 1M tokens per 1M tokens

Official Z.ai API pricing. Also available via OpenRouter (model id z-ai/glm-5.1) at roughly $0.98 input / $3.08 output per 1M tokens. Open weights (MIT) can be self-hosted at infrastructure cost.

Pricing source ↗

Strengths

  • State-of-the-art coding on SWE-Bench Pro (58.4) at release, reported ahead of GPT-5.4 and Claude Opus 4.6
  • Long-horizon autonomous execution — sustains a single task for up to ~8 hours across hundreds of iterations
  • Open weights under the permissive MIT license, with both BF16 and native FP8 checkpoints for self-hosting
  • Strong agentic and tool-use scores (BrowseComp 68.0, MCP-Atlas 71.8, τ³-Bench 70.6)
  • Strong math and reasoning (AIME 2026 95.3, GPQA-Diamond 86.2) with thinking mode on by default
  • 200K context with up to 128K output, suited to large codebases and long agent traces

Best for

  • Long-running autonomous coding agents that plan, execute and self-correct over many iterations
  • Software engineering tasks: bug fixing, refactoring and repository-level changes (SWE-Bench-style work)
  • Agentic workflows with tool calling, browsing and MCP-based tool orchestration
  • Self-hosted deployment where open weights and a permissive MIT license are required
  • Math and STEM reasoning with extended thinking
  • Cost-sensitive frontier coding via open weights or competitively priced hosted APIs

How to access

ProviderModel ID
Z.ai ↗glm-5.1
OpenRouter ↗z-ai/glm-5.1
Hugging Face (weights) ↗zai-org/GLM-5.1

GLM (flagship) — every version

The full lineage of the GLM (flagship) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
GLM-5.2current2026-06-131MMIT
GLM-5.12026-04-07Open weights
GLM-52026-02-11Apache-2.0
GLM-4.72025-12-22Open weights
GLM-4.62025-09-30MIT
GLM-4.52025-07-28MIT

FAQ

Is GLM-5.1 open source?

Yes. Z.ai released GLM-5.1's weights on Hugging Face (zai-org/GLM-5.1) under the permissive MIT license, including both BF16 and native FP8 checkpoints. The MIT license allows you to download, inspect, modify, fine-tune and use the model commercially without restriction.

How big is GLM-5.1 and what is its context window?

GLM-5.1 is a Mixture-of-Experts model with roughly 754B total parameters and about 40B active per token. It has a 200K-token context window and can generate up to 128K output tokens. (Some trackers list the total as 744B, carried over from GLM-5; the official model card states ~754B.)

How does GLM-5.1 compare to GPT-5.4 and Claude Opus 4.6 on coding?

On SWE-Bench Pro, Z.ai reports GLM-5.1 scoring 58.4, ahead of GPT-5.4 (57.7) and Claude Opus 4.6 (57.3) — making it state-of-the-art on that benchmark at release. Note that several April 7 figures were self-reported by Z.ai and were still pending full independent verification.

What is GLM-5.1 best at?

Long-horizon agentic engineering. Z.ai positions it to work autonomously on a single task for up to about 8 hours — planning, executing and iterating — and it posts strong coding, agentic tool-use (BrowseComp, MCP-Atlas, τ³-Bench) and math/reasoning (AIME 2026, GPQA-Diamond) scores.