GLM-4-9B

Tsinghua/Zhipu's open 9B model — GLM-4-9B-Chat does 128K context, tool calling and 26 languages, beating Llama-3-8B.

Overview

GLM-4-9B is the open 9-billion-parameter model in Zhipu AI's GLM-4 family (the lab now operates as Z.ai), released on 2024-06-05 by Team GLM at Tsinghua University and published under the THUDM organization on Hugging Face. It succeeds the earlier ChatGLM line and ships in several forms: the GLM-4-9B base model, the human-preference-aligned GLM-4-9B-Chat, a long-context GLM-4-9B-Chat-1M variant, and the separate GLM-4V-9B vision model.

GLM-4-9B-Chat is the flagship of the line. It supports a 128K-token context window (with a 1M-token variant available), built-in tool calling, web browsing, and code execution, and works across 26 languages including Japanese, Korean and German. According to its model card and the ChatGLM technical report, GLM-4-9B and GLM-4-9B-Chat outperform Llama-3-8B across semantics, math, reasoning, code and knowledge evaluations.

Because the weights are open, GLM-4-9B can be run locally or self-hosted, and it is also served free of charge through aggregators such as OpenRouter. The detailed methodology behind the family is documented in the paper 'ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools' (arXiv:2406.12793).

Released	2024-06-05
License	GLM-4 License (custom open weights)
Weights	Open weights
Parameters	9B
Context	128K
Max output	Not publicly disclosed
Architecture	Dense transformer (GLM-4 architecture)
Knowledge cutoff	Not publicly disclosed
Modalities	Text
Status	Available (open weights)

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	Free / 1M tokens
Output	Free / 1M tokens

Served free of charge via OpenRouter (thudm/glm-4-9b:free). Open weights can also be self-hosted at no licensing cost.

Pricing source ↗

Strengths

Strong capability-per-parameter for a 9B model — beats Llama-3-8B on the lab's reported semantics, math, reasoning, code and knowledge evals
128K context in GLM-4-9B-Chat, with a dedicated 1M-token long-context variant
Native tool calling, web browsing and code execution support
Multilingual across 26 languages (incl. Japanese, Korean, German)
Open weights — runs locally / self-hosted, small enough for single-GPU or on-device use
Available free through OpenRouter for low-cost experimentation

Best for

Local and on-device chat assistants where a small open model is preferred
Long-document analysis using the 128K / 1M context variants
Tool-using and function-calling agents on a self-hosted model
Multilingual chat and translation across 26 languages
Research and fine-tuning on a permissively distributed open-weight base
Cost-sensitive prototyping via the free OpenRouter endpoint

How to access

Provider	Model ID
OpenRouter ↗	`thudm/glm-4-9b:free`
Hugging Face ↗	`THUDM/glm-4-9b-chat`

FAQ

Is GLM-4-9B open source?

The weights are openly released by Team GLM (THUDM / Zhipu) on Hugging Face under the custom GLM-4 License, so you can download, run and fine-tune the model locally. Use of the weights must comply with the GLM-4 LICENSE terms in the repository.

What context length does GLM-4-9B support?

GLM-4-9B-Chat supports up to 128K tokens. There is also a dedicated GLM-4-9B-Chat-1M variant that extends the context to 1M tokens. The plain base model uses a shorter window.

How does GLM-4-9B compare to Llama-3-8B?

Per its model card and the ChatGLM technical report, both GLM-4-9B and the aligned GLM-4-9B-Chat outperform Llama-3-8B across semantics, math, reasoning, code and knowledge evaluations. GLM-4-9B-Chat scores 72.4 on MMLU, 79.6 on GSM8K, 50.6 on MATH and 71.8 on HumanEval.

Can GLM-4-9B handle images?

The text GLM-4-9B model is text-only. For vision, the family includes a separate multimodal model, GLM-4V-9B, built on the same 9B base, which handles image understanding.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// FAQ