AI/TLDR

GLM-4.5V

Open-weight 106B/12B-active vision-language MoE with a toggleable thinking mode for images, video, documents and GUI agents.

Overview

GLM-4.5V is the open-weight vision-language model in Z.ai's (Zhipu / GLM) GLM-V line, released on August 11, 2025. It is a Mixture-of-Experts model with 106B total parameters and 12B activated per token, built on top of the GLM-4.5-Air-Base text foundation. GLM-4.5V handles images (up to 4K resolution, arbitrary aspect ratio), video, multi-image prompts, and documents such as PDFs and slide decks, and it can output precise bounding-box grounding for elements in a scene.

Like the earlier GLM-4.1V-9B-Thinking, GLM-4.5V ships with a toggleable thinking mode: users can switch between fast direct answers and a deeper chain-of-thought pass for harder visual reasoning, OCR, chart and document parsing, video understanding, and GUI-agent tasks (screen reading, icon detection, desktop operation). Z.ai reports state-of-the-art results among similarly sized open models across 42 public vision-language benchmarks.

The weights are released under the MIT license on Hugging Face (zai-org/GLM-4.5V) and the GLM-V GitHub repo, allowing commercial use and fine-tuning. GLM-4.5V is also served through the Z.ai API platform and third parties such as OpenRouter, with an exposed context window of roughly 64K tokens and up to 16K tokens of output.

Released2025-08-11
LicenseMIT
WeightsOpen weights
Parameters106B total / 12B active (MoE)
Context64K
Max output16K
ArchitectureMixture-of-Experts vision-language model built on the GLM-4.5-Air-Base text foundation (106B parameters, 12B activated). Pairs an image/video vision encoder with the MoE language model and adds a user-toggleable "thinking" mode that trades latency for deeper multimodal reasoning. Trained with scalable reinforcement learning for visual reasoning, grounding (bounding-box output), and GUI-agent control.
Knowledge cutoffDecember 2024
ModalitiesText, Vision, Video, PDF
StatusAvailable

Benchmarks

  1. MMBench V1.1 (thinking)88.2%
  2. MMStar (thinking)75.3%
  3. MMMU val (thinking)75.4%
  4. MMMU Pro (thinking)65.2%
  5. MathVista (thinking)84.6%
  6. AI2D (thinking)88.1%
  7. OCRBench (thinking)86.5%
  8. ChartQAPro (thinking)64%
  9. ChartMuseum (thinking)55.3%
  10. OSWorld (GUI agent, thinking)35.8%
  11. WebVoyager (GUI agent, thinking)84.4%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.60 / 1M tokens per 1M tokens
Cached input$0.11 / 1M tokens per 1M tokens
Output$1.80 / 1M tokens per 1M tokens

Z.ai API platform list price. Open weights (MIT) can also be self-hosted at no per-token cost.

Pricing source ↗

Strengths

  • Open weights under the permissive MIT license — free for commercial use and fine-tuning
  • Efficient MoE design: 106B total parameters but only 12B activated per token
  • Toggleable thinking mode balances fast responses against deeper multimodal reasoning
  • Broad visual coverage: images up to 4K, video, multi-image, PDFs and slides in one pass
  • Strong STEM and chart/document scores (MMMU, MathVista, AI2D, ChartQAPro)
  • Native grounding with bounding-box output and GUI-agent control (OSWorld, WebVoyager)

Best for

  • Document, chart and long-PDF understanding and information extraction
  • OCR and structured data extraction from images and plots
  • Long-video segmentation and event recognition
  • GUI agents: screen reading, icon detection and desktop operation assistance
  • Visual grounding and spatial localization with bounding boxes
  • Multi-image scene analysis, defect inspection and geo/context inference

How to access

ProviderModel ID
Z.ai API Platform ↗glm-4.5v
OpenRouter ↗z-ai/glm-4.5v

GLM-V (vision-language) — every version

The full lineage of the GLM-V (vision-language) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
GLM-4.6Vcurrent2025-12-08MIT
GLM-4.5V2025-08-11Open weights
GLM-4.1V-9B-Thinking2025-07-01Open weights

FAQ

Is GLM-4.5V open source?

Yes. The weights are released under the permissive MIT license on Hugging Face (zai-org/GLM-4.5V) and the GLM-V GitHub repo, allowing commercial use, redistribution and fine-tuning.

How big is GLM-4.5V?

It is a Mixture-of-Experts model with 106B total parameters and 12B activated per token, built on the GLM-4.5-Air-Base text foundation. The efficient MoE design keeps inference cost closer to a 12B model.

What inputs does GLM-4.5V support?

Text plus visual inputs: images up to 4K resolution at any aspect ratio, video, multi-image prompts, and documents such as PDFs and slide decks. It can also output bounding-box grounding and drive GUI agents.

What does GLM-4.5V cost to use?

On the Z.ai API platform it lists at $0.60 per million input tokens and $1.80 per million output tokens, with cached input at $0.11 per million. Because the weights are MIT-licensed, you can also self-host with no per-token fee.