GLM-4.7

Z.ai's open-weight 355B MoE flagship — a coding-and-agent model that thinks before it acts, with stronger reasoning and steadier long-horizon tool use.

Overview

GLM-4.7 is the December 2025 flagship from Z.ai (the team behind the GLM / Zhipu models), released on 2025-12-22 under the permissive MIT license with open weights published on Hugging Face. It is a Mixture-of-Experts language model with roughly 355B total parameters and about 32B activated per token, so it runs far cheaper than its raw size suggests while still aiming at frontier-level coding and agent performance.

The model is built for real software-development and agentic workflows. GLM-4.7 reasons before it answers and before each tool call, keeps that reasoning across conversation turns, and lets you turn thinking on or off per request to trade speed for accuracy. Compared with GLM-4.6 it reports clear gains on coding (SWE-bench Verified rising to 73.8%), multi-step tool use, and harder reasoning, including a jump on the HLE exam.

GLM-4.7 handles text only, with a 200K-token context window and up to 128K tokens of output. Because the weights are MIT-licensed you can self-host or fine-tune it for commercial use; it is also served through Z.ai's own API and third-party routers such as OpenRouter and SiliconFlow.

Released	2025-12-22
License	MIT
Weights	Open weights
Parameters	355B total / 32B active (MoE)
Context	200K
Max output	128K
Architecture	Mixture-of-Experts (MoE) transformer. Roughly 355B total parameters with about 32B activated per token. Adds interleaved, preserved and turn-level "thinking" so the model can reason before each response and tool call, keep that reasoning across turns, and let developers toggle reasoning depth per request.
Modalities	Text
Status	Available

Benchmarks

SWE-bench Verified73.8%
SWE-bench Multilingual66.7%
Terminal Bench 2.041%
LiveCodeBench v684.9%
τ²-Bench (interactive tool use)84.7%
HLE (Humanity's Last Exam, with tools)42.8%
AIME 202595.7%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$1.40 / 1M tokens per 1M tokens
Cached input	$0.26 / 1M tokens per 1M tokens
Output	$2.20 / 1M tokens per 1M tokens

Official Z.ai API list price. Third-party routers list GLM-4.7 lower — e.g. OpenRouter at about $0.40 input / $1.75 output and SiliconFlow at about $0.42 input / $2.20 output. A free GLM-4.7-Flash tier is also offered.

Pricing source ↗

Strengths

Strong real-world coding: 73.8% on SWE-bench Verified and 84.9 on LiveCodeBench v6, competitive with leading closed models
Stable agentic and tool-use behaviour over long multi-step tasks, with 84.7 on the τ²-Bench interactive tool-invocation benchmark
Controllable reasoning — interleaved, preserved and per-turn 'thinking' lets you tune depth vs. latency
Open MIT weights: self-host, fine-tune and use commercially with no usage gate
Efficient for its scale — MoE keeps only ~32B of 355B parameters active per token
Large 200K context with up to 128K output for big repositories and long agent traces

Best for

Autonomous and semi-autonomous coding agents that resolve real GitHub issues across multiple files
Long-horizon agentic workflows with repeated tool calls and preserved reasoning
Self-hosted or fine-tuned deployments where MIT licensing and data control matter
Math and complex-reasoning tasks (AIME-style problems, multi-step analysis)
Terminal and developer-tooling automation
Cost-sensitive production use that still needs near-frontier coding quality

How to access

Provider	Model ID
Z.ai ↗	`glm-4.7`
OpenRouter ↗	`z-ai/glm-4.7`
SiliconFlow ↗	`zai-org/GLM-4.7`

GLM (flagship) — every version

The full lineage of the GLM (flagship) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
GLM-5.2current	2026-06-13	1M	MIT
GLM-5.1	2026-04-07	—	Open weights
GLM-5	2026-02-11	—	Apache-2.0
GLM-4.7	2025-12-22	—	Open weights
GLM-4.6	2025-09-30	—	MIT
GLM-4.5	2025-07-28	—	MIT

FAQ

Is GLM-4.7 open source?

The weights are open and released under the permissive MIT license on Hugging Face (zai-org/GLM-4.7), so you can download, self-host, fine-tune and use the model commercially. Z.ai calls it open-weight; the training data and full recipe are not published, which is typical for open-weight (rather than fully open-source) releases.

How big is GLM-4.7 and how fast is it?

It is a Mixture-of-Experts model with roughly 355B total parameters but only about 32B activated per token. That MoE design lets it reach near-frontier coding quality while keeping inference cost and latency far below what a dense 355B model would require.

What is GLM-4.7 best at?

Coding and agentic tool use. It reports 73.8% on SWE-bench Verified, 84.9 on LiveCodeBench v6 and 84.7 on the τ²-Bench tool-use benchmark, with controllable 'thinking' that improves multi-step, long-horizon tasks.

How much does GLM-4.7 cost to use?

Z.ai's official API lists it at about $1.40 per million input tokens and $2.20 per million output tokens, with cached input around $0.26. Third-party routers such as OpenRouter price it lower (roughly $0.40 input / $1.75 output), and a free GLM-4.7-Flash tier is available.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// GLM (flagship) — every version

// FAQ