XGrammar

Grammar-constrained decoding that guarantees structurally correct LLM output

github.com/mlc-ai/xgrammar★ 1.8k xgrammar.mlc.ai

Overview

XGrammar is an open-source library for structured generation from large language models. It uses constrained decoding to guarantee that a model's output follows a given structure, so the result is always valid against the format you ask for. It supports JSON, regex, and general context-free grammars.

It is built for developers and teams running LLM inference who need outputs they can parse and trust, such as JSON for tool calls or fixed schemas. Because it constrains the tokens a model can emit, the structure is correct by construction rather than fixed up after the fact, and it is designed to add very little overhead to JSON generation.

In the LLM orchestration space, XGrammar sits at the structured-output layer. It is the default structured generation backend for several inference engines, including vLLM, SGLang, TensorRT-LLM, and MLC-LLM, and runs across Linux, macOS, and Windows on CPU and GPU hardware.

What it does

Constrained decoding that ensures 100% structurally correct output
Supports JSON, regex, and general context-free grammars for custom structures
Low overhead, with near-zero overhead reported for JSON generation
Universal deployment across Linux, macOS, and Windows on CPU, NVIDIA/AMD GPU, Apple Silicon, and TPU
Python, C++, JavaScript, and Swift APIs
Default structured generation backend in vLLM, SGLang, TensorRT-LLM, and MLC-LLM

Getting started

Install XGrammar with pip and import it, then follow the official docs for full quickstart usage.

Install XGrammar

Install the package from PyPI.

bashbash

pip install xgrammar

Install for Apple Silicon (optional)

For use with MPS on Apple Silicon, install the metal extra.

bashbash

pip install "xgrammar[metal]"

Import XGrammar

Import the library in Python; see the official documentation for the full quick start.

pythonpython

import xgrammar as xgr

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Forcing an LLM to return valid JSON that matches a schema for tool calling or API responses
Constraining output to a custom context-free grammar or regex for a domain-specific format
Adding structured generation to an inference server such as vLLM or SGLang
Generating parseable output reliably without post-hoc validation and retries

How XGrammar compares

XGrammar alongside other open-source structured output tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Guidance	★ 21.5k	A programming model that interleaves generation, prompting, and control logic to constrain output and enforce formats like JSON or regex patterns.
Outlines	★ 14k	A library for structured generation that constrains an LLM's token output to match a JSON schema, regex, or grammar so the result is always valid.
Instructor	★ 13.2k	A library that wraps an LLM client to return data validated against a schema, retrying automatically on invalid output, with SDKs in several languages.
BAML	★ 8.4k	A domain-specific language for defining LLM functions with typed schemas, parsing flexible model output into reliable structured data across many languages.
Marvin	★ 6.2k	A Python toolkit from Prefect for turning LLM calls into typed functions that extract, classify, and cast text into structured Python objects.
LM Format Enforcer	★ 2k	A library that enforces an output format such as JSON schema or regex by filtering the tokens an LLM is allowed to generate at each step.
XGrammar	★ 1.8k	Grammar-constrained decoding that guarantees structurally correct LLM output

// Overview

// What it does

// Getting started