AI/TLDR

GLM-4-9B

Tsinghua/Zhipu's open 9B model — GLM-4-9B-Chat does 128K context, tool calling and 26 languages, beating Llama-3-8B.

Overview

GLM-4-9B is the open 9-billion-parameter model in Zhipu AI's GLM-4 family (the lab now operates as Z.ai), released on 2024-06-05 by Team GLM at Tsinghua University and published under the THUDM organization on Hugging Face. It succeeds the earlier ChatGLM line and ships in several forms: the GLM-4-9B base model, the human-preference-aligned GLM-4-9B-Chat, a long-context GLM-4-9B-Chat-1M variant, and the separate GLM-4V-9B vision model.

GLM-4-9B-Chat is the flagship of the line. It supports a 128K-token context window (with a 1M-token variant available), built-in tool calling, web browsing, and code execution, and works across 26 languages including Japanese, Korean and German. According to its model card and the ChatGLM technical report, GLM-4-9B and GLM-4-9B-Chat outperform Llama-3-8B across semantics, math, reasoning, code and knowledge evaluations.

Because the weights are open, GLM-4-9B can be run locally or self-hosted, and it is also served free of charge through aggregators such as OpenRouter. The detailed methodology behind the family is documented in the paper 'ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools' (arXiv:2406.12793).

Released2024-06-05
LicenseGLM-4 License (custom open weights)
WeightsOpen weights
Parameters9B
Context128K
Max outputNot publicly disclosed
ArchitectureDense transformer (GLM-4 architecture)
Knowledge cutoffNot publicly disclosed
ModalitiesText
StatusAvailable (open weights)

Benchmarks

  1. MMLU (GLM-4-9B-Chat)72.4%
  2. C-Eval (GLM-4-9B-Chat)75.6%
  3. GSM8K (GLM-4-9B-Chat)79.6%
  4. MATH (GLM-4-9B-Chat)50.6%
  5. HumanEval (GLM-4-9B-Chat)71.8%
  6. IFEval (GLM-4-9B-Chat)69%
  7. AlignBench-v2 (GLM-4-9B-Chat)6.61%
  8. MT-Bench (GLM-4-9B-Chat)8.35%
  9. MMLU (GLM-4-9B base)74.7%
  10. C-Eval (GLM-4-9B base)77.1%
  11. GSM8K (GLM-4-9B base)84%
  12. GPQA (GLM-4-9B base)34.3%
  13. HumanEval (GLM-4-9B base)70.1%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

InputFree / 1M tokens
OutputFree / 1M tokens

Served free of charge via OpenRouter (thudm/glm-4-9b:free). Open weights can also be self-hosted at no licensing cost.

Pricing source ↗

Strengths

  • Strong capability-per-parameter for a 9B model — beats Llama-3-8B on the lab's reported semantics, math, reasoning, code and knowledge evals
  • 128K context in GLM-4-9B-Chat, with a dedicated 1M-token long-context variant
  • Native tool calling, web browsing and code execution support
  • Multilingual across 26 languages (incl. Japanese, Korean, German)
  • Open weights — runs locally / self-hosted, small enough for single-GPU or on-device use
  • Available free through OpenRouter for low-cost experimentation

Best for

  • Local and on-device chat assistants where a small open model is preferred
  • Long-document analysis using the 128K / 1M context variants
  • Tool-using and function-calling agents on a self-hosted model
  • Multilingual chat and translation across 26 languages
  • Research and fine-tuning on a permissively distributed open-weight base
  • Cost-sensitive prototyping via the free OpenRouter endpoint

How to access

ProviderModel ID
OpenRouter ↗thudm/glm-4-9b:free
Hugging Face ↗THUDM/glm-4-9b-chat

FAQ

Is GLM-4-9B open source?

The weights are openly released by Team GLM (THUDM / Zhipu) on Hugging Face under the custom GLM-4 License, so you can download, run and fine-tune the model locally. Use of the weights must comply with the GLM-4 LICENSE terms in the repository.

What context length does GLM-4-9B support?

GLM-4-9B-Chat supports up to 128K tokens. There is also a dedicated GLM-4-9B-Chat-1M variant that extends the context to 1M tokens. The plain base model uses a shorter window.

How does GLM-4-9B compare to Llama-3-8B?

Per its model card and the ChatGLM technical report, both GLM-4-9B and the aligned GLM-4-9B-Chat outperform Llama-3-8B across semantics, math, reasoning, code and knowledge evaluations. GLM-4-9B-Chat scores 72.4 on MMLU, 79.6 on GSM8K, 50.6 on MATH and 71.8 on HumanEval.

Can GLM-4-9B handle images?

The text GLM-4-9B model is text-only. For vision, the family includes a separate multimodal model, GLM-4V-9B, built on the same 9B base, which handles image understanding.