Overview
GLM-4.1V-9B-Thinking is an open-weight vision-language model released on July 1, 2025 by Z.ai (the Zhipu AI / GLM team) together with Tsinghua University's KEG lab. It is the first reasoning-focused entry in the GLM-V multimodal line, built on the GLM-4-9B-0414 base and paired with an AIMv2-Huge vision encoder. Despite being a roughly 9-billion-parameter model, it is positioned to compete with much larger systems on multimodal reasoning.
The defining feature is its 'thinking' paradigm: GLM-4.1V-9B-Thinking produces an explicit chain-of-thought before answering, which the team trained with a method they call Reinforcement Learning with Curriculum Sampling (RLCS). The model accepts text, images, and video, supports a 64K-token context, and handles arbitrary aspect ratios and image resolutions up to 4K via 2D-RoPE positional encoding.
Because the weights ship under an MIT license, GLM-4.1V-9B-Thinking can be downloaded, fine-tuned, and self-hosted without API fees, and its small size makes it practical to run on a single modern GPU. It is also offered as a hosted API through Z.ai's platform and third-party inference providers such as SiliconFlow for teams that prefer not to manage their own infrastructure.
| Released | 2025-07-01 |
|---|---|
| License | MIT |
| Weights | Open weights |
| Parameters | 9B |
| Context | 64K |
| Max output | 8K |
| Architecture | Vision-language model combining an AIMv2-Huge vision encoder, an MLP adapter, and a GLM language decoder built on the GLM-4-9B-0414 base. Adds a "thinking" chain-of-thought reasoning paradigm trained with Reinforcement Learning with Curriculum Sampling (RLCS). Uses 2D-RoPE to handle arbitrary aspect ratios and image resolutions up to 4K. |
| Knowledge cutoff | Not disclosed |
| Modalities | Text, Vision, Video |
| Status | Available |
Benchmarks
- MMStar72.9%
- MathVista80.7%
- MMMU68%
- MMMU-Pro57.1%
- AI2D87.9%
- MMBench-V1.1-EN85.8%
- OCRBench84.2%
- VideoMME (with subtitles)73.6%
- WeMath63.8%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.035 / 1M tokens per 1M tokens |
|---|---|
| Output | $0.14 / 1M tokens per 1M tokens |
Hosted API pricing via SiliconFlow. The model weights are open under MIT, so self-hosting incurs no per-token fee.
Strengths
- Open-weight (MIT) — free to download, fine-tune, and self-host with no usage restrictions
- Strong multimodal reasoning for its size: leads 10B-class models on 23 of 28 reported benchmarks and beats Qwen2.5-VL-72B on 18 of them
- Explicit chain-of-thought 'thinking' output improves accuracy and interpretability on STEM and math-vision tasks
- Handles arbitrary aspect ratios and up to 4K image resolution, plus video input
- Small enough (9B) to run on a single GPU, keeping inference cheap
Best for
- Solving STEM and math problems presented as images, diagrams, or charts
- Document and chart understanding, including OCR-heavy and long-document pages
- Video understanding and question answering
- GUI / screenshot understanding and UI navigation agents
- Self-hosted multimodal reasoning where data privacy or cost rules out a closed API
How to access
| Provider | Model ID |
|---|---|
| Z.ai (Zhipu / BigModel) ↗ | glm-4.1v-thinking-flash |
| SiliconFlow ↗ | zai-org/GLM-4.1V-9B-Thinking |
| Hugging Face (weights) ↗ | zai-org/GLM-4.1V-9B-Thinking |
GLM-V (vision-language) — every version
The full lineage of the GLM-V (vision-language) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
FAQ
Is GLM-4.1V-9B-Thinking open source?
Yes. The weights are released under the MIT license on Hugging Face (zai-org/GLM-4.1V-9B-Thinking), so you can download, fine-tune, and self-host the model commercially without per-token fees.
What can GLM-4.1V-9B-Thinking process besides text?
It is a vision-language model that accepts images and video alongside text, with text output. It handles arbitrary aspect ratios and image resolutions up to 4K, and supports a 64K-token context window.
How does a 9B model compete with 72B models?
Its 'thinking' chain-of-thought paradigm, trained with Reinforcement Learning with Curriculum Sampling (RLCS), lets it reason step by step before answering. In Z.ai's technical report it leads 10B-class models on 23 of 28 benchmarks and outperforms the much larger Qwen2.5-VL-72B on 18 of them.
How much does GLM-4.1V-9B-Thinking cost to use?
Self-hosting is free since the weights are open. As a hosted API it is inexpensive — SiliconFlow lists about $0.035 per million input tokens and $0.14 per million output tokens, and it is also served through Z.ai's own platform.