Overview
GLM-4.5-Air is the compact member of Z.ai's (Zhipu) GLM-4.5 family, released on 28 July 2025 alongside the larger GLM-4.5 flagship. It is a Mixture-of-Experts model with 106 billion total parameters and 12 billion active per token, designed to deliver near-flagship reasoning, coding, and agentic ability at a fraction of the size and cost. The full GLM-4.5 sibling uses 355B total / 32B active; GLM-4.5-Air trades raw capacity for efficiency, fitting on a single high-memory GPU.
Like GLM-4.5, GLM-4.5-Air is a hybrid reasoning model: it exposes a 'thinking' mode for complex reasoning and tool use and a 'non-thinking' mode for fast, direct answers. It targets agent-centric applications — long-horizon coding, function calling, and tool orchestration — rather than just chat. The weights are published under the MIT license on Hugging Face and ModelScope, so the model can be downloaded, fine-tuned, and used commercially.
On Z.ai's own 12-benchmark aggregate covering agentic, reasoning, and coding (ARC) tasks, GLM-4.5-Air scores 59.8, close behind the 63.2 of the full GLM-4.5 while running far cheaper. It is the cost-efficient default in the GLM Air line and the direct predecessor to later Air/Flash-tier releases such as GLM-4.6 and GLM-4.7-Flash.
| Released | 2025-07-28 |
|---|---|
| License | MIT |
| Weights | Open weights |
| Parameters | 106B total · 12B active |
| Context | 128K |
| Max output | 96K |
| Architecture | Mixture-of-Experts (hybrid thinking / non-thinking) |
| Knowledge cutoff | Not disclosed |
| Modalities | Text |
| Status | Generally available |
Benchmarks
- MMLU-Pro81.4%
- AIME 202489.4%
- MATH-50098.1%
- GPQA Diamond71.72%
- BFCL-v3 (function calling)76.4%
- LiveCodeBench70.7%
- SWE-bench Verified57.6%
- TAU-bench (Airline)60.8%
- ARC aggregate (12 benchmarks)59.8
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.20 / 1M tokens |
|---|---|
| Output | $1.10 / 1M tokens |
Official Z.ai API list price for GLM-4.5-Air. Third-party hosts (e.g. OpenRouter at ~$0.13 / $0.85) and open weights are also available.
Strengths
- Strong agentic and tool-use ability for its size — built specifically for function calling and multi-step agent workflows
- Open MIT-licensed weights: free to download, fine-tune, and use commercially via Hugging Face and ModelScope
- Hybrid thinking / non-thinking modes let you trade reasoning depth for latency in a single checkpoint
- Excellent math and reasoning for a 12B-active model (AIME 2024 89.4, MATH-500 98.1)
- Much cheaper than the GLM-4.5 flagship ($0.20 / $1.10 vs $0.60 / $2.20 per 1M tokens) with close aggregate scores
- Compact 106B MoE design runs on a single high-memory GPU, lowering self-hosting cost
Best for
- Agentic coding assistants and autonomous coding agents that call tools over long horizons
- Self-hosted deployments where open weights and commercial-friendly licensing matter
- Function-calling and tool-orchestration backends needing reliable structured output
- Cost-sensitive reasoning workloads (math, logic, analysis) that can't justify a flagship-size model
- Fine-tuning and secondary development on a permissively-licensed base
- Latency-flexible apps that switch between fast non-thinking replies and deeper thinking-mode reasoning
How to access
| Provider | Model ID |
|---|---|
| Z.ai API Platform ↗ | glm-4.5-air |
| OpenRouter ↗ | z-ai/glm-4.5-air |
FAQ
How big is GLM-4.5-Air?
GLM-4.5-Air is a Mixture-of-Experts model with 106 billion total parameters and 12 billion active per token. That makes it the compact sibling of the full GLM-4.5, which has 355B total / 32B active parameters.
Is GLM-4.5-Air open source?
Yes. Z.ai released the GLM-4.5-Air weights under the MIT license on Hugging Face and ModelScope. They can be downloaded, fine-tuned, and used commercially, including for secondary development.
What is GLM-4.5-Air's context window?
Z.ai's developer documentation lists a 128K-token context window and up to 96K tokens of output for GLM-4.5-Air. Some third-party hosts advertise the underlying 131,072-token (128K) limit.
How much does GLM-4.5-Air cost?
On the official Z.ai API, GLM-4.5-Air is priced at $0.20 per million input tokens and $1.10 per million output tokens — about a third of the GLM-4.5 flagship's $0.60 / $2.20. Open weights and cheaper third-party hosts (e.g. OpenRouter around $0.13 / $0.85) are also available.