Overview
GLM-4.5V is the open-weight vision-language model in Z.ai's (Zhipu / GLM) GLM-V line, released on August 11, 2025. It is a Mixture-of-Experts model with 106B total parameters and 12B activated per token, built on top of the GLM-4.5-Air-Base text foundation. GLM-4.5V handles images (up to 4K resolution, arbitrary aspect ratio), video, multi-image prompts, and documents such as PDFs and slide decks, and it can output precise bounding-box grounding for elements in a scene.
Like the earlier GLM-4.1V-9B-Thinking, GLM-4.5V ships with a toggleable thinking mode: users can switch between fast direct answers and a deeper chain-of-thought pass for harder visual reasoning, OCR, chart and document parsing, video understanding, and GUI-agent tasks (screen reading, icon detection, desktop operation). Z.ai reports state-of-the-art results among similarly sized open models across 42 public vision-language benchmarks.
The weights are released under the MIT license on Hugging Face (zai-org/GLM-4.5V) and the GLM-V GitHub repo, allowing commercial use and fine-tuning. GLM-4.5V is also served through the Z.ai API platform and third parties such as OpenRouter, with an exposed context window of roughly 64K tokens and up to 16K tokens of output.
| Released | 2025-08-11 |
|---|---|
| License | MIT |
| Weights | Open weights |
| Parameters | 106B total / 12B active (MoE) |
| Context | 64K |
| Max output | 16K |
| Architecture | Mixture-of-Experts vision-language model built on the GLM-4.5-Air-Base text foundation (106B parameters, 12B activated). Pairs an image/video vision encoder with the MoE language model and adds a user-toggleable "thinking" mode that trades latency for deeper multimodal reasoning. Trained with scalable reinforcement learning for visual reasoning, grounding (bounding-box output), and GUI-agent control. |
| Knowledge cutoff | December 2024 |
| Modalities | Text, Vision, Video, PDF |
| Status | Available |
Benchmarks
- MMBench V1.1 (thinking)88.2%
- MMStar (thinking)75.3%
- MMMU val (thinking)75.4%
- MMMU Pro (thinking)65.2%
- MathVista (thinking)84.6%
- AI2D (thinking)88.1%
- OCRBench (thinking)86.5%
- ChartQAPro (thinking)64%
- ChartMuseum (thinking)55.3%
- OSWorld (GUI agent, thinking)35.8%
- WebVoyager (GUI agent, thinking)84.4%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.60 / 1M tokens per 1M tokens |
|---|---|
| Cached input | $0.11 / 1M tokens per 1M tokens |
| Output | $1.80 / 1M tokens per 1M tokens |
Z.ai API platform list price. Open weights (MIT) can also be self-hosted at no per-token cost.
Strengths
- Open weights under the permissive MIT license — free for commercial use and fine-tuning
- Efficient MoE design: 106B total parameters but only 12B activated per token
- Toggleable thinking mode balances fast responses against deeper multimodal reasoning
- Broad visual coverage: images up to 4K, video, multi-image, PDFs and slides in one pass
- Strong STEM and chart/document scores (MMMU, MathVista, AI2D, ChartQAPro)
- Native grounding with bounding-box output and GUI-agent control (OSWorld, WebVoyager)
Best for
- Document, chart and long-PDF understanding and information extraction
- OCR and structured data extraction from images and plots
- Long-video segmentation and event recognition
- GUI agents: screen reading, icon detection and desktop operation assistance
- Visual grounding and spatial localization with bounding boxes
- Multi-image scene analysis, defect inspection and geo/context inference
How to access
| Provider | Model ID |
|---|---|
| Z.ai API Platform ↗ | glm-4.5v |
| OpenRouter ↗ | z-ai/glm-4.5v |
GLM-V (vision-language) — every version
The full lineage of the GLM-V (vision-language) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| GLM-4.6Vcurrent | 2025-12-08 | — | MIT |
| GLM-4.5V | 2025-08-11 | — | Open weights |
| GLM-4.1V-9B-Thinking | 2025-07-01 | — | Open weights |
FAQ
Is GLM-4.5V open source?
Yes. The weights are released under the permissive MIT license on Hugging Face (zai-org/GLM-4.5V) and the GLM-V GitHub repo, allowing commercial use, redistribution and fine-tuning.
How big is GLM-4.5V?
It is a Mixture-of-Experts model with 106B total parameters and 12B activated per token, built on the GLM-4.5-Air-Base text foundation. The efficient MoE design keeps inference cost closer to a 12B model.
What inputs does GLM-4.5V support?
Text plus visual inputs: images up to 4K resolution at any aspect ratio, video, multi-image prompts, and documents such as PDFs and slide decks. It can also output bounding-box grounding and drive GUI agents.
What does GLM-4.5V cost to use?
On the Z.ai API platform it lists at $0.60 per million input tokens and $1.80 per million output tokens, with cached input at $0.11 per million. Because the weights are MIT-licensed, you can also self-host with no per-token fee.