GLM-4.6

Z.ai's open-weight coding flagship: 357B-param MoE (32B active), 200K context, MIT-licensed.

Overview

GLM-4.6 is the September 30, 2025 flagship release from Z.ai (Zhipu AI's GLM line). It is an open-weight, MIT-licensed Mixture-of-Experts model with 357 billion total parameters and about 32 billion active per token. Its headline change over GLM-4.5 is a context window expanded from 128K to 200K tokens, which gives it more room for long agentic coding sessions and large-codebase work.

Z.ai positions GLM-4.6 primarily as a coding and agentic model. It ships with an optional thinking mode for extended reasoning and tool use, and Z.ai reports it is roughly 30% more token-efficient than GLM-4.5 on equivalent tasks. The weights are published on Hugging Face (zai-org/GLM-4.6), and the model is wired into popular coding agents such as Claude Code, Cline, Roo Code and Kilo Code.

Across public benchmarks GLM-4.6 posts strong reasoning and coding scores (for example 93.9% on AIME 2025 and 82.8% on LiveCodeBench v6) and, in Z.ai's CC-Bench human-evaluation tests, reaches a 48.6% win rate against Claude Sonnet 4. It sits as one of the stronger open-weight coding models of its generation, though Z.ai notes it still trails the very top closed coding models on some tasks.

Released	2025-09-30
License	MIT
Weights	Open weights
Parameters	357B total / 32B active (MoE)
Context	200K
Max output	128K
Architecture	Mixture-of-Experts (MoE) transformer with 357B total parameters and roughly 32B active per token, supporting an optional "thinking" reasoning mode with tool use during inference. Released with open weights under the MIT license.
Knowledge cutoff	March 2025
Modalities	Text
Status	Available

Benchmarks

AIME 202593.9%
LiveCodeBench v682.8%
BrowseComp45.1%
Terminal-Bench40.5%
CC-Bench win rate vs Claude Sonnet 448.6%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.60 / 1M tokens per 1M tokens
Cached input	$0.11 / 1M tokens per 1M tokens
Output	$2.20 / 1M tokens per 1M tokens

Official Z.ai first-party API pricing. Open weights are also free to self-host under the MIT license.

Pricing source ↗

Strengths

Open weights under a permissive MIT license — free to self-host, fine-tune and deploy commercially
200K-token context window, expanded from 128K in GLM-4.5, for long agentic coding and large-codebase tasks
Strong real-world coding and agentic performance; integrates with Claude Code, Cline, Roo Code and Kilo Code
About 30% more token-efficient than GLM-4.5 on equivalent tasks, lowering inference cost
Optional thinking mode for extended step-by-step reasoning and tool use
Competitive low API pricing ($0.60 input / $2.20 output per 1M tokens)

Best for

Agentic coding inside tools like Claude Code, Cline, Roo Code and Kilo Code
Long-context tasks: reasoning over large codebases or document sets up to 200K tokens
Self-hosted or private deployments where open weights and an MIT license are required
Front-end and full-stack code generation and refactoring
Math and reasoning workloads (AIME-style problems, competitive coding)
Cost-sensitive production workloads needing a capable open model at low per-token price

How to access

Provider	Model ID
Z.ai ↗	`glm-4.6`
OpenRouter ↗	`z-ai/glm-4.6`

GLM (flagship) — every version

The full lineage of the GLM (flagship) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
GLM-5.2current	2026-06-13	1M	MIT
GLM-5.1	2026-04-07	—	Open weights
GLM-5	2026-02-11	—	Apache-2.0
GLM-4.7	2025-12-22	—	Open weights
GLM-4.6	2025-09-30	—	MIT
GLM-4.5	2025-07-28	—	MIT

FAQ

Is GLM-4.6 open source?

GLM-4.6 ships with open weights under the permissive MIT license. They are published on Hugging Face (zai-org/GLM-4.6), so you can self-host, fine-tune and deploy the model commercially for free; you only pay if you use Z.ai's hosted API.

What is GLM-4.6's context window?

GLM-4.6 has a 200K-token context window, expanded from 128K in GLM-4.5, with a maximum output of 128K tokens per response.

How big is GLM-4.6 and what architecture does it use?

It is a Mixture-of-Experts (MoE) model with 357 billion total parameters, of which roughly 32 billion are active per token. It also supports an optional thinking mode for extended reasoning and tool use.

How much does GLM-4.6 cost to use?

On Z.ai's first-party API, GLM-4.6 costs $0.60 per million input tokens and $2.20 per million output tokens, with cached input at $0.11 per million. Because the weights are open under MIT, self-hosting is also an option.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// GLM (flagship) — every version

// FAQ