Overview
GLM-4.6 is the September 30, 2025 flagship release from Z.ai (Zhipu AI's GLM line). It is an open-weight, MIT-licensed Mixture-of-Experts model with 357 billion total parameters and about 32 billion active per token. Its headline change over GLM-4.5 is a context window expanded from 128K to 200K tokens, which gives it more room for long agentic coding sessions and large-codebase work.
Z.ai positions GLM-4.6 primarily as a coding and agentic model. It ships with an optional thinking mode for extended reasoning and tool use, and Z.ai reports it is roughly 30% more token-efficient than GLM-4.5 on equivalent tasks. The weights are published on Hugging Face (zai-org/GLM-4.6), and the model is wired into popular coding agents such as Claude Code, Cline, Roo Code and Kilo Code.
Across public benchmarks GLM-4.6 posts strong reasoning and coding scores (for example 93.9% on AIME 2025 and 82.8% on LiveCodeBench v6) and, in Z.ai's CC-Bench human-evaluation tests, reaches a 48.6% win rate against Claude Sonnet 4. It sits as one of the stronger open-weight coding models of its generation, though Z.ai notes it still trails the very top closed coding models on some tasks.
| Released | 2025-09-30 |
|---|---|
| License | MIT |
| Weights | Open weights |
| Parameters | 357B total / 32B active (MoE) |
| Context | 200K |
| Max output | 128K |
| Architecture | Mixture-of-Experts (MoE) transformer with 357B total parameters and roughly 32B active per token, supporting an optional "thinking" reasoning mode with tool use during inference. Released with open weights under the MIT license. |
| Knowledge cutoff | March 2025 |
| Modalities | Text |
| Status | Available |
Benchmarks
- AIME 202593.9%
- LiveCodeBench v682.8%
- BrowseComp45.1%
- Terminal-Bench40.5%
- CC-Bench win rate vs Claude Sonnet 448.6%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.60 / 1M tokens per 1M tokens |
|---|---|
| Cached input | $0.11 / 1M tokens per 1M tokens |
| Output | $2.20 / 1M tokens per 1M tokens |
Official Z.ai first-party API pricing. Open weights are also free to self-host under the MIT license.
Strengths
- Open weights under a permissive MIT license — free to self-host, fine-tune and deploy commercially
- 200K-token context window, expanded from 128K in GLM-4.5, for long agentic coding and large-codebase tasks
- Strong real-world coding and agentic performance; integrates with Claude Code, Cline, Roo Code and Kilo Code
- About 30% more token-efficient than GLM-4.5 on equivalent tasks, lowering inference cost
- Optional thinking mode for extended step-by-step reasoning and tool use
- Competitive low API pricing ($0.60 input / $2.20 output per 1M tokens)
Best for
- Agentic coding inside tools like Claude Code, Cline, Roo Code and Kilo Code
- Long-context tasks: reasoning over large codebases or document sets up to 200K tokens
- Self-hosted or private deployments where open weights and an MIT license are required
- Front-end and full-stack code generation and refactoring
- Math and reasoning workloads (AIME-style problems, competitive coding)
- Cost-sensitive production workloads needing a capable open model at low per-token price
How to access
| Provider | Model ID |
|---|---|
| Z.ai ↗ | glm-4.6 |
| OpenRouter ↗ | z-ai/glm-4.6 |
GLM (flagship) — every version
The full lineage of the GLM (flagship) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
FAQ
Is GLM-4.6 open source?
GLM-4.6 ships with open weights under the permissive MIT license. They are published on Hugging Face (zai-org/GLM-4.6), so you can self-host, fine-tune and deploy the model commercially for free; you only pay if you use Z.ai's hosted API.
What is GLM-4.6's context window?
GLM-4.6 has a 200K-token context window, expanded from 128K in GLM-4.5, with a maximum output of 128K tokens per response.
How big is GLM-4.6 and what architecture does it use?
It is a Mixture-of-Experts (MoE) model with 357 billion total parameters, of which roughly 32 billion are active per token. It also supports an optional thinking mode for extended reasoning and tool use.
How much does GLM-4.6 cost to use?
On Z.ai's first-party API, GLM-4.6 costs $0.60 per million input tokens and $2.20 per million output tokens, with cached input at $0.11 per million. Because the weights are open under MIT, self-hosting is also an option.