Overview
GLM-5-Turbo is a large language model from Z.ai (Zhipu / GLM), released on March 15, 2026 as part of the GLM Turbo line. Unlike the open-weight GLM-5 flagship, GLM-5-Turbo is a proprietary, closed-source companion model that is only available through Z.ai's hosted API and partner platforms — there is no weights download.
GLM-5-Turbo is a reasoning model that uses extended chain-of-thought before answering, and it is tuned specifically for high-throughput agentic workloads. Z.ai's release notes describe it as focused on stability and efficiency in long-chain agent tasks: stronger tool and skills integration, better decomposition of complex instructions, and more consistent execution across multi-step, multi-agent workflows.
It serves text in and text out (it is not multimodal), accepts a long context window of up to 262,144 tokens, and can return up to 131,072 tokens in a single response. On the Artificial Analysis Intelligence Index it scores 38, placing GLM-5-Turbo among the stronger models tracked there at the time of release.
| Released | 2026-03-15 |
|---|---|
| License | Proprietary |
| Weights | API only |
| Parameters | ~744B total / ~40B active (MoE, shared GLM-5 base) |
| Context | 262K |
| Max output | 128K |
| Architecture | Mixture-of-Experts built on the GLM-5 base (~744B total parameters, ~40B active) with DeepSeek Sparse Attention (DSA). Served as a throughput-optimised, closed API variant rather than a separately released checkpoint. |
| Modalities | Text |
| Status | Available |
Benchmarks
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $1.20 / 1M tokens per 1M tokens |
|---|---|
| Output | $4.00 / 1M tokens per 1M tokens |
Strengths
- Tuned for high-throughput, long-chain agent tasks with improved stability over many steps
- Strong tool and skills integration plus reliable decomposition of complex, multi-step instructions
- Long 262K-token context window with up to 128K-token outputs for large agent workloads
- Built on the capable GLM-5 MoE base (~744B total / ~40B active params)
- Competitive Artificial Analysis Intelligence Index score (38) for its price tier
Best for
- Autonomous and multi-agent systems that run long tool-calling chains
- Agentic coding and engineering assistants that decompose multi-step tasks
- High-volume API workloads where throughput and stability matter
- Long-document and long-context reasoning over large inputs
- Tool-augmented assistants that browse, call functions, and orchestrate skills
How to access
| Provider | Model ID |
|---|---|
| Z.ai ↗ | glm-5-turbo |
| OpenRouter ↗ | z-ai/glm-5-turbo |
GLM Turbo — every version
The full lineage of the GLM Turbo line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| GLM-5V-Turbocurrent | 2026-04-01 | — | Open weights |
| GLM-5-Turbo | 2026-03-15 | — | Open weights |
FAQ
Is GLM-5-Turbo open source or open weights?
No. Unlike the GLM-5 flagship, which Z.ai released with open weights, GLM-5-Turbo is a proprietary, closed-source companion model. It is available only through Z.ai's hosted API and partner platforms such as OpenRouter — there is no weights download.
What is GLM-5-Turbo best at?
It is optimised for high-throughput agentic workloads: long tool-calling chains, multi-step task decomposition, and multi-agent coordination. Z.ai positions it for fast inference and stability across extended agent runs rather than as a maximal-quality model.
What is GLM-5-Turbo's context window and pricing?
GLM-5-Turbo supports up to a 262K-token context window and up to 128K tokens of output. On OpenRouter it is priced at about $1.20 per million input tokens and $4.00 per million output tokens.
Is GLM-5-Turbo multimodal?
No. GLM-5-Turbo is text-in, text-out only. For vision tasks, Z.ai offers a separate GLM-5V-Turbo multimodal variant.