AI/TLDR

GLM-4.6

Z.ai's open-weight coding flagship: 357B-param MoE (32B active), 200K context, MIT-licensed.

Overview

GLM-4.6 is the September 30, 2025 flagship release from Z.ai (Zhipu AI's GLM line). It is an open-weight, MIT-licensed Mixture-of-Experts model with 357 billion total parameters and about 32 billion active per token. Its headline change over GLM-4.5 is a context window expanded from 128K to 200K tokens, which gives it more room for long agentic coding sessions and large-codebase work.

Z.ai positions GLM-4.6 primarily as a coding and agentic model. It ships with an optional thinking mode for extended reasoning and tool use, and Z.ai reports it is roughly 30% more token-efficient than GLM-4.5 on equivalent tasks. The weights are published on Hugging Face (zai-org/GLM-4.6), and the model is wired into popular coding agents such as Claude Code, Cline, Roo Code and Kilo Code.

Across public benchmarks GLM-4.6 posts strong reasoning and coding scores (for example 93.9% on AIME 2025 and 82.8% on LiveCodeBench v6) and, in Z.ai's CC-Bench human-evaluation tests, reaches a 48.6% win rate against Claude Sonnet 4. It sits as one of the stronger open-weight coding models of its generation, though Z.ai notes it still trails the very top closed coding models on some tasks.

Released2025-09-30
LicenseMIT
WeightsOpen weights
Parameters357B total / 32B active (MoE)
Context200K
Max output128K
ArchitectureMixture-of-Experts (MoE) transformer with 357B total parameters and roughly 32B active per token, supporting an optional "thinking" reasoning mode with tool use during inference. Released with open weights under the MIT license.
Knowledge cutoffMarch 2025
ModalitiesText
StatusAvailable

Benchmarks

  1. AIME 202593.9%
  2. LiveCodeBench v682.8%
  3. BrowseComp45.1%
  4. Terminal-Bench40.5%
  5. CC-Bench win rate vs Claude Sonnet 448.6%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.60 / 1M tokens per 1M tokens
Cached input$0.11 / 1M tokens per 1M tokens
Output$2.20 / 1M tokens per 1M tokens

Official Z.ai first-party API pricing. Open weights are also free to self-host under the MIT license.

Pricing source ↗

Strengths

  • Open weights under a permissive MIT license — free to self-host, fine-tune and deploy commercially
  • 200K-token context window, expanded from 128K in GLM-4.5, for long agentic coding and large-codebase tasks
  • Strong real-world coding and agentic performance; integrates with Claude Code, Cline, Roo Code and Kilo Code
  • About 30% more token-efficient than GLM-4.5 on equivalent tasks, lowering inference cost
  • Optional thinking mode for extended step-by-step reasoning and tool use
  • Competitive low API pricing ($0.60 input / $2.20 output per 1M tokens)

Best for

  • Agentic coding inside tools like Claude Code, Cline, Roo Code and Kilo Code
  • Long-context tasks: reasoning over large codebases or document sets up to 200K tokens
  • Self-hosted or private deployments where open weights and an MIT license are required
  • Front-end and full-stack code generation and refactoring
  • Math and reasoning workloads (AIME-style problems, competitive coding)
  • Cost-sensitive production workloads needing a capable open model at low per-token price

How to access

ProviderModel ID
Z.ai ↗glm-4.6
OpenRouter ↗z-ai/glm-4.6

GLM (flagship) — every version

The full lineage of the GLM (flagship) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
GLM-5.2current2026-06-131MMIT
GLM-5.12026-04-07Open weights
GLM-52026-02-11Apache-2.0
GLM-4.72025-12-22Open weights
GLM-4.62025-09-30MIT
GLM-4.52025-07-28MIT

FAQ

Is GLM-4.6 open source?

GLM-4.6 ships with open weights under the permissive MIT license. They are published on Hugging Face (zai-org/GLM-4.6), so you can self-host, fine-tune and deploy the model commercially for free; you only pay if you use Z.ai's hosted API.

What is GLM-4.6's context window?

GLM-4.6 has a 200K-token context window, expanded from 128K in GLM-4.5, with a maximum output of 128K tokens per response.

How big is GLM-4.6 and what architecture does it use?

It is a Mixture-of-Experts (MoE) model with 357 billion total parameters, of which roughly 32 billion are active per token. It also supports an optional thinking mode for extended reasoning and tool use.

How much does GLM-4.6 cost to use?

On Z.ai's first-party API, GLM-4.6 costs $0.60 per million input tokens and $2.20 per million output tokens, with cached input at $0.11 per million. Because the weights are open under MIT, self-hosting is also an option.