Overview
GLM-4.7 is the December 2025 flagship from Z.ai (the team behind the GLM / Zhipu models), released on 2025-12-22 under the permissive MIT license with open weights published on Hugging Face. It is a Mixture-of-Experts language model with roughly 355B total parameters and about 32B activated per token, so it runs far cheaper than its raw size suggests while still aiming at frontier-level coding and agent performance.
The model is built for real software-development and agentic workflows. GLM-4.7 reasons before it answers and before each tool call, keeps that reasoning across conversation turns, and lets you turn thinking on or off per request to trade speed for accuracy. Compared with GLM-4.6 it reports clear gains on coding (SWE-bench Verified rising to 73.8%), multi-step tool use, and harder reasoning, including a jump on the HLE exam.
GLM-4.7 handles text only, with a 200K-token context window and up to 128K tokens of output. Because the weights are MIT-licensed you can self-host or fine-tune it for commercial use; it is also served through Z.ai's own API and third-party routers such as OpenRouter and SiliconFlow.
| Released | 2025-12-22 |
|---|---|
| License | MIT |
| Weights | Open weights |
| Parameters | 355B total / 32B active (MoE) |
| Context | 200K |
| Max output | 128K |
| Architecture | Mixture-of-Experts (MoE) transformer. Roughly 355B total parameters with about 32B activated per token. Adds interleaved, preserved and turn-level "thinking" so the model can reason before each response and tool call, keep that reasoning across turns, and let developers toggle reasoning depth per request. |
| Modalities | Text |
| Status | Available |
Benchmarks
- SWE-bench Verified73.8%
- SWE-bench Multilingual66.7%
- Terminal Bench 2.041%
- LiveCodeBench v684.9%
- τ²-Bench (interactive tool use)84.7%
- HLE (Humanity's Last Exam, with tools)42.8%
- AIME 202595.7%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $1.40 / 1M tokens per 1M tokens |
|---|---|
| Cached input | $0.26 / 1M tokens per 1M tokens |
| Output | $2.20 / 1M tokens per 1M tokens |
Official Z.ai API list price. Third-party routers list GLM-4.7 lower — e.g. OpenRouter at about $0.40 input / $1.75 output and SiliconFlow at about $0.42 input / $2.20 output. A free GLM-4.7-Flash tier is also offered.
Strengths
- Strong real-world coding: 73.8% on SWE-bench Verified and 84.9 on LiveCodeBench v6, competitive with leading closed models
- Stable agentic and tool-use behaviour over long multi-step tasks, with 84.7 on the τ²-Bench interactive tool-invocation benchmark
- Controllable reasoning — interleaved, preserved and per-turn 'thinking' lets you tune depth vs. latency
- Open MIT weights: self-host, fine-tune and use commercially with no usage gate
- Efficient for its scale — MoE keeps only ~32B of 355B parameters active per token
- Large 200K context with up to 128K output for big repositories and long agent traces
Best for
- Autonomous and semi-autonomous coding agents that resolve real GitHub issues across multiple files
- Long-horizon agentic workflows with repeated tool calls and preserved reasoning
- Self-hosted or fine-tuned deployments where MIT licensing and data control matter
- Math and complex-reasoning tasks (AIME-style problems, multi-step analysis)
- Terminal and developer-tooling automation
- Cost-sensitive production use that still needs near-frontier coding quality
How to access
| Provider | Model ID |
|---|---|
| Z.ai ↗ | glm-4.7 |
| OpenRouter ↗ | z-ai/glm-4.7 |
| SiliconFlow ↗ | zai-org/GLM-4.7 |
GLM (flagship) — every version
The full lineage of the GLM (flagship) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
FAQ
Is GLM-4.7 open source?
The weights are open and released under the permissive MIT license on Hugging Face (zai-org/GLM-4.7), so you can download, self-host, fine-tune and use the model commercially. Z.ai calls it open-weight; the training data and full recipe are not published, which is typical for open-weight (rather than fully open-source) releases.
How big is GLM-4.7 and how fast is it?
It is a Mixture-of-Experts model with roughly 355B total parameters but only about 32B activated per token. That MoE design lets it reach near-frontier coding quality while keeping inference cost and latency far below what a dense 355B model would require.
What is GLM-4.7 best at?
Coding and agentic tool use. It reports 73.8% on SWE-bench Verified, 84.9 on LiveCodeBench v6 and 84.7 on the τ²-Bench tool-use benchmark, with controllable 'thinking' that improves multi-step, long-horizon tasks.
How much does GLM-4.7 cost to use?
Z.ai's official API lists it at about $1.40 per million input tokens and $2.20 per million output tokens, with cached input around $0.26. Third-party routers such as OpenRouter price it lower (roughly $0.40 input / $1.75 output), and a free GLM-4.7-Flash tier is available.