GLM-4.5-Air

Z.ai's lightweight 106B/12B MoE agentic model — near-flagship reasoning and tool use, MIT-licensed, runs on a single 24GB+ GPU.

Overview

GLM-4.5-Air is the compact member of Z.ai's (Zhipu) GLM-4.5 family, released on 28 July 2025 alongside the larger GLM-4.5 flagship. It is a Mixture-of-Experts model with 106 billion total parameters and 12 billion active per token, designed to deliver near-flagship reasoning, coding, and agentic ability at a fraction of the size and cost. The full GLM-4.5 sibling uses 355B total / 32B active; GLM-4.5-Air trades raw capacity for efficiency, fitting on a single high-memory GPU.

Like GLM-4.5, GLM-4.5-Air is a hybrid reasoning model: it exposes a 'thinking' mode for complex reasoning and tool use and a 'non-thinking' mode for fast, direct answers. It targets agent-centric applications — long-horizon coding, function calling, and tool orchestration — rather than just chat. The weights are published under the MIT license on Hugging Face and ModelScope, so the model can be downloaded, fine-tuned, and used commercially.

On Z.ai's own 12-benchmark aggregate covering agentic, reasoning, and coding (ARC) tasks, GLM-4.5-Air scores 59.8, close behind the 63.2 of the full GLM-4.5 while running far cheaper. It is the cost-efficient default in the GLM Air line and the direct predecessor to later Air/Flash-tier releases such as GLM-4.6 and GLM-4.7-Flash.

Released	2025-07-28
License	MIT
Weights	Open weights
Parameters	106B total · 12B active
Context	128K
Max output	96K
Architecture	Mixture-of-Experts (hybrid thinking / non-thinking)
Knowledge cutoff	Not disclosed
Modalities	Text
Status	Generally available

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.20 / 1M tokens
Output	$1.10 / 1M tokens

Official Z.ai API list price for GLM-4.5-Air. Third-party hosts (e.g. OpenRouter at ~$0.13 / $0.85) and open weights are also available.

Pricing source ↗

Strengths

Strong agentic and tool-use ability for its size — built specifically for function calling and multi-step agent workflows
Open MIT-licensed weights: free to download, fine-tune, and use commercially via Hugging Face and ModelScope
Hybrid thinking / non-thinking modes let you trade reasoning depth for latency in a single checkpoint
Excellent math and reasoning for a 12B-active model (AIME 2024 89.4, MATH-500 98.1)
Much cheaper than the GLM-4.5 flagship ($0.20 / $1.10 vs $0.60 / $2.20 per 1M tokens) with close aggregate scores
Compact 106B MoE design runs on a single high-memory GPU, lowering self-hosting cost

Best for

Agentic coding assistants and autonomous coding agents that call tools over long horizons
Self-hosted deployments where open weights and commercial-friendly licensing matter
Function-calling and tool-orchestration backends needing reliable structured output
Cost-sensitive reasoning workloads (math, logic, analysis) that can't justify a flagship-size model
Fine-tuning and secondary development on a permissively-licensed base
Latency-flexible apps that switch between fast non-thinking replies and deeper thinking-mode reasoning

How to access

Provider	Model ID
Z.ai API Platform ↗	`glm-4.5-air`
OpenRouter ↗	`z-ai/glm-4.5-air`

FAQ

How big is GLM-4.5-Air?

GLM-4.5-Air is a Mixture-of-Experts model with 106 billion total parameters and 12 billion active per token. That makes it the compact sibling of the full GLM-4.5, which has 355B total / 32B active parameters.

Is GLM-4.5-Air open source?

Yes. Z.ai released the GLM-4.5-Air weights under the MIT license on Hugging Face and ModelScope. They can be downloaded, fine-tuned, and used commercially, including for secondary development.

What is GLM-4.5-Air's context window?

Z.ai's developer documentation lists a 128K-token context window and up to 96K tokens of output for GLM-4.5-Air. Some third-party hosts advertise the underlying 131,072-token (128K) limit.

How much does GLM-4.5-Air cost?

On the official Z.ai API, GLM-4.5-Air is priced at $0.20 per million input tokens and $1.10 per million output tokens — about a third of the GLM-4.5 flagship's $0.60 / $2.20. Open weights and cheaper third-party hosts (e.g. OpenRouter around $0.13 / $0.85) are also available.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// FAQ