GLM-5

Z.ai's open-weight 744B MoE flagship — "from vibe coding to agentic engineering," MIT-licensed with a 200K context.

Overview

GLM-5 is the flagship large language model from Z.ai (Zhipu), released on 2026-02-11 as the first model in the GLM-5 generation. It is a Mixture-of-Experts model with roughly 744B total parameters and about 40B active per token, scaling up from the 355B-class GLM-4.5/4.6 line. GLM-5 ships with open weights under the MIT license on Hugging Face (zai-org/GLM-5) and ModelScope, in both BF16 and FP8.

Z.ai positioned GLM-5 around the theme "from vibe coding to agentic engineering," pushing past quick prototyping toward long-horizon, multi-step engineering work: planning, coding, testing and fixing across large codebases. The architecture uses DeepSeek Sparse Attention (DSA) to keep long-context serving affordable, supports a 200K-token context window, and can generate up to 128K tokens in a single response with a dedicated thinking mode, function calling and structured output.

GLM-5 was pre-trained on about 28.5T tokens and, notably, trained entirely on Huawei Ascend hardware for supply-chain independence. It reports strong results on reasoning, coding and agentic benchmarks among open-weight models, and is priced aggressively through the official Z.ai API ($1.00 per million input tokens, $3.20 per million output).

Released	2026-02-11
License	MIT
Weights	Open weights
Parameters	744B total / 40B active (MoE)
Context	200K
Max output	128K
Architecture	Mixture-of-Experts (internal type glm_moe_dsa): 256 routed experts plus 1 shared, 8 experts activated per token, 78 layers, 744B total parameters with ~40B active. Uses DeepSeek Sparse Attention (DSA) to cut long-context deployment cost. Pre-trained on ~28.5T tokens, trained fully on Huawei Ascend hardware. Released in BF16 and FP8.
Modalities	Text
Status	Released

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$1.00 / 1M tokens per 1M tokens
Cached input	$0.20 / 1M tokens per 1M tokens
Output	$3.20 / 1M tokens per 1M tokens

Official Z.ai API pricing for model id glm-5. Open weights are also available under MIT for free self-hosting.

Pricing source ↗

Strengths

Open weights under the permissive MIT license — free to self-host, fine-tune and deploy commercially, with FP8 weights for cheaper inference
Large MoE capacity (744B total / ~40B active) with DeepSeek Sparse Attention to keep long-context serving costs down
Built for agentic, long-horizon engineering: planning, multi-file coding, testing and fixing rather than one-shot snippets
200K-token context with up to 128K-token output and a built-in thinking mode for extended reasoning
Competitive open-model scores on reasoning and software-engineering benchmarks (SWE-bench Verified, AIME 2026, GPQA-Diamond)
Aggressive official API pricing ($1.00 in / $3.20 out per million tokens, $0.20 cached input)

Best for

Autonomous coding agents that plan, edit across many files, run tests and iterate on fixes
Self-hosted or on-prem deployment where MIT-licensed open weights and data control matter
Long-context analysis over large codebases, documents or logs up to ~200K tokens
Math and science reasoning tasks (competition math, graduate-level QA)
Tool-using and function-calling agents with structured JSON output
Cost-sensitive production workloads that need frontier-class coding at a fraction of closed-model prices

How to access

Provider	Model ID
Z.ai ↗	`glm-5`
OpenRouter ↗	`z-ai/glm-5`

GLM (flagship) — every version

The full lineage of the GLM (flagship) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
GLM-5.2current	2026-06-13	1M	MIT
GLM-5.1	2026-04-07	—	Open weights
GLM-5	2026-02-11	—	Apache-2.0
GLM-4.7	2025-12-22	—	Open weights
GLM-4.6	2025-09-30	—	MIT
GLM-4.5	2025-07-28	—	MIT

FAQ

Is GLM-5 open source?

GLM-5 ships with open weights under the permissive MIT license. You can download the weights from Hugging Face (zai-org/GLM-5) or ModelScope in BF16 and FP8, and self-host, fine-tune and deploy them commercially without restriction. Note that open weights is not the same as fully open-source training data or code, but the MIT terms on the released model are very permissive.

How big is GLM-5 and what is its architecture?

GLM-5 is a Mixture-of-Experts model with about 744B total parameters and roughly 40B active per token. It uses 256 routed experts plus one shared expert (8 activated per token) across 78 layers, and adopts DeepSeek Sparse Attention (DSA) to keep long-context inference affordable. It was pre-trained on about 28.5T tokens and trained entirely on Huawei Ascend hardware.

What context window and output length does GLM-5 support?

GLM-5 supports a 200K-token context window (about 202,752 positions) and can generate up to 128K tokens (131,072) in a single response, with a thinking mode for extended reasoning. It is a text-only model.

How much does GLM-5 cost to use via the API?

Through the official Z.ai API, GLM-5 (model id glm-5) costs $1.00 per million input tokens and $3.20 per million output tokens, with cached input at $0.20 per million. Because the weights are MIT-licensed, you can also run it yourself for the cost of your own hardware.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// GLM (flagship) — every version

// FAQ