AI/TLDR

GLM-5-Turbo

Z.ai's proprietary, API-only GLM-5 variant tuned for fast, high-throughput agent workflows.

Overview

GLM-5-Turbo is a large language model from Z.ai (Zhipu / GLM), released on March 15, 2026 as part of the GLM Turbo line. Unlike the open-weight GLM-5 flagship, GLM-5-Turbo is a proprietary, closed-source companion model that is only available through Z.ai's hosted API and partner platforms — there is no weights download.

GLM-5-Turbo is a reasoning model that uses extended chain-of-thought before answering, and it is tuned specifically for high-throughput agentic workloads. Z.ai's release notes describe it as focused on stability and efficiency in long-chain agent tasks: stronger tool and skills integration, better decomposition of complex instructions, and more consistent execution across multi-step, multi-agent workflows.

It serves text in and text out (it is not multimodal), accepts a long context window of up to 262,144 tokens, and can return up to 131,072 tokens in a single response. On the Artificial Analysis Intelligence Index it scores 38, placing GLM-5-Turbo among the stronger models tracked there at the time of release.

Released2026-03-15
LicenseProprietary
WeightsAPI only
Parameters~744B total / ~40B active (MoE, shared GLM-5 base)
Context262K
Max output128K
ArchitectureMixture-of-Experts built on the GLM-5 base (~744B total parameters, ~40B active) with DeepSeek Sparse Attention (DSA). Served as a throughput-optimised, closed API variant rather than a separately released checkpoint.
ModalitiesText
StatusAvailable

Benchmarks

  1. Artificial Analysis Intelligence Index38%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$1.20 / 1M tokens per 1M tokens
Output$4.00 / 1M tokens per 1M tokens

Pricing source ↗

Strengths

  • Tuned for high-throughput, long-chain agent tasks with improved stability over many steps
  • Strong tool and skills integration plus reliable decomposition of complex, multi-step instructions
  • Long 262K-token context window with up to 128K-token outputs for large agent workloads
  • Built on the capable GLM-5 MoE base (~744B total / ~40B active params)
  • Competitive Artificial Analysis Intelligence Index score (38) for its price tier

Best for

  • Autonomous and multi-agent systems that run long tool-calling chains
  • Agentic coding and engineering assistants that decompose multi-step tasks
  • High-volume API workloads where throughput and stability matter
  • Long-document and long-context reasoning over large inputs
  • Tool-augmented assistants that browse, call functions, and orchestrate skills

How to access

ProviderModel ID
Z.ai ↗glm-5-turbo
OpenRouter ↗z-ai/glm-5-turbo

GLM Turbo — every version

The full lineage of the GLM Turbo line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
GLM-5V-Turbocurrent2026-04-01Open weights
GLM-5-Turbo2026-03-15Open weights

FAQ

Is GLM-5-Turbo open source or open weights?

No. Unlike the GLM-5 flagship, which Z.ai released with open weights, GLM-5-Turbo is a proprietary, closed-source companion model. It is available only through Z.ai's hosted API and partner platforms such as OpenRouter — there is no weights download.

What is GLM-5-Turbo best at?

It is optimised for high-throughput agentic workloads: long tool-calling chains, multi-step task decomposition, and multi-agent coordination. Z.ai positions it for fast inference and stability across extended agent runs rather than as a maximal-quality model.

What is GLM-5-Turbo's context window and pricing?

GLM-5-Turbo supports up to a 262K-token context window and up to 128K tokens of output. On OpenRouter it is priced at about $1.20 per million input tokens and $4.00 per million output tokens.

Is GLM-5-Turbo multimodal?

No. GLM-5-Turbo is text-in, text-out only. For vision tasks, Z.ai offers a separate GLM-5V-Turbo multimodal variant.