AI/TLDR

GLM-4.5-Air

Z.ai's lightweight 106B/12B MoE agentic model — near-flagship reasoning and tool use, MIT-licensed, runs on a single 24GB+ GPU.

Overview

GLM-4.5-Air is the compact member of Z.ai's (Zhipu) GLM-4.5 family, released on 28 July 2025 alongside the larger GLM-4.5 flagship. It is a Mixture-of-Experts model with 106 billion total parameters and 12 billion active per token, designed to deliver near-flagship reasoning, coding, and agentic ability at a fraction of the size and cost. The full GLM-4.5 sibling uses 355B total / 32B active; GLM-4.5-Air trades raw capacity for efficiency, fitting on a single high-memory GPU.

Like GLM-4.5, GLM-4.5-Air is a hybrid reasoning model: it exposes a 'thinking' mode for complex reasoning and tool use and a 'non-thinking' mode for fast, direct answers. It targets agent-centric applications — long-horizon coding, function calling, and tool orchestration — rather than just chat. The weights are published under the MIT license on Hugging Face and ModelScope, so the model can be downloaded, fine-tuned, and used commercially.

On Z.ai's own 12-benchmark aggregate covering agentic, reasoning, and coding (ARC) tasks, GLM-4.5-Air scores 59.8, close behind the 63.2 of the full GLM-4.5 while running far cheaper. It is the cost-efficient default in the GLM Air line and the direct predecessor to later Air/Flash-tier releases such as GLM-4.6 and GLM-4.7-Flash.

Released2025-07-28
LicenseMIT
WeightsOpen weights
Parameters106B total · 12B active
Context128K
Max output96K
ArchitectureMixture-of-Experts (hybrid thinking / non-thinking)
Knowledge cutoffNot disclosed
ModalitiesText
StatusGenerally available

Benchmarks

  1. MMLU-Pro81.4%
  2. AIME 202489.4%
  3. MATH-50098.1%
  4. GPQA Diamond71.72%
  5. BFCL-v3 (function calling)76.4%
  6. LiveCodeBench70.7%
  7. SWE-bench Verified57.6%
  8. TAU-bench (Airline)60.8%
  9. ARC aggregate (12 benchmarks)59.8

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.20 / 1M tokens
Output$1.10 / 1M tokens

Official Z.ai API list price for GLM-4.5-Air. Third-party hosts (e.g. OpenRouter at ~$0.13 / $0.85) and open weights are also available.

Pricing source ↗

Strengths

  • Strong agentic and tool-use ability for its size — built specifically for function calling and multi-step agent workflows
  • Open MIT-licensed weights: free to download, fine-tune, and use commercially via Hugging Face and ModelScope
  • Hybrid thinking / non-thinking modes let you trade reasoning depth for latency in a single checkpoint
  • Excellent math and reasoning for a 12B-active model (AIME 2024 89.4, MATH-500 98.1)
  • Much cheaper than the GLM-4.5 flagship ($0.20 / $1.10 vs $0.60 / $2.20 per 1M tokens) with close aggregate scores
  • Compact 106B MoE design runs on a single high-memory GPU, lowering self-hosting cost

Best for

  • Agentic coding assistants and autonomous coding agents that call tools over long horizons
  • Self-hosted deployments where open weights and commercial-friendly licensing matter
  • Function-calling and tool-orchestration backends needing reliable structured output
  • Cost-sensitive reasoning workloads (math, logic, analysis) that can't justify a flagship-size model
  • Fine-tuning and secondary development on a permissively-licensed base
  • Latency-flexible apps that switch between fast non-thinking replies and deeper thinking-mode reasoning

How to access

ProviderModel ID
Z.ai API Platform ↗glm-4.5-air
OpenRouter ↗z-ai/glm-4.5-air

FAQ

How big is GLM-4.5-Air?

GLM-4.5-Air is a Mixture-of-Experts model with 106 billion total parameters and 12 billion active per token. That makes it the compact sibling of the full GLM-4.5, which has 355B total / 32B active parameters.

Is GLM-4.5-Air open source?

Yes. Z.ai released the GLM-4.5-Air weights under the MIT license on Hugging Face and ModelScope. They can be downloaded, fine-tuned, and used commercially, including for secondary development.

What is GLM-4.5-Air's context window?

Z.ai's developer documentation lists a 128K-token context window and up to 96K tokens of output for GLM-4.5-Air. Some third-party hosts advertise the underlying 131,072-token (128K) limit.

How much does GLM-4.5-Air cost?

On the official Z.ai API, GLM-4.5-Air is priced at $0.20 per million input tokens and $1.10 per million output tokens — about a third of the GLM-4.5 flagship's $0.60 / $2.20. Open weights and cheaper third-party hosts (e.g. OpenRouter around $0.13 / $0.85) are also available.