AI/TLDR

XGrammar

Grammar-constrained decoding that guarantees structurally correct LLM output

Overview

XGrammar is an open-source library for structured generation from large language models. It uses constrained decoding to guarantee that a model's output follows a given structure, so the result is always valid against the format you ask for. It supports JSON, regex, and general context-free grammars.

It is built for developers and teams running LLM inference who need outputs they can parse and trust, such as JSON for tool calls or fixed schemas. Because it constrains the tokens a model can emit, the structure is correct by construction rather than fixed up after the fact, and it is designed to add very little overhead to JSON generation.

In the LLM orchestration space, XGrammar sits at the structured-output layer. It is the default structured generation backend for several inference engines, including vLLM, SGLang, TensorRT-LLM, and MLC-LLM, and runs across Linux, macOS, and Windows on CPU and GPU hardware.

What it does

  • Constrained decoding that ensures 100% structurally correct output
  • Supports JSON, regex, and general context-free grammars for custom structures
  • Low overhead, with near-zero overhead reported for JSON generation
  • Universal deployment across Linux, macOS, and Windows on CPU, NVIDIA/AMD GPU, Apple Silicon, and TPU
  • Python, C++, JavaScript, and Swift APIs
  • Default structured generation backend in vLLM, SGLang, TensorRT-LLM, and MLC-LLM

Getting started

Install XGrammar with pip and import it, then follow the official docs for full quickstart usage.

Install XGrammar

Install the package from PyPI.

bashbash
pip install xgrammar

Install for Apple Silicon (optional)

For use with MPS on Apple Silicon, install the metal extra.

bashbash
pip install "xgrammar[metal]"

Import XGrammar

Import the library in Python; see the official documentation for the full quick start.

pythonpython
import xgrammar as xgr

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Forcing an LLM to return valid JSON that matches a schema for tool calling or API responses
  • Constraining output to a custom context-free grammar or regex for a domain-specific format
  • Adding structured generation to an inference server such as vLLM or SGLang
  • Generating parseable output reliably without post-hoc validation and retries

How XGrammar compares

XGrammar alongside other open-source structured output tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Guidance★ 21.5kA programming model that interleaves generation, prompting, and control logic to constrain output and enforce formats like JSON or regex patterns.
Outlines★ 14kA library for structured generation that constrains an LLM's token output to match a JSON schema, regex, or grammar so the result is always valid.
Instructor★ 13.2kA library that wraps an LLM client to return data validated against a schema, retrying automatically on invalid output, with SDKs in several languages.
BAML★ 8.4kA domain-specific language for defining LLM functions with typed schemas, parsing flexible model output into reliable structured data across many languages.
Marvin★ 6.2kA Python toolkit from Prefect for turning LLM calls into typed functions that extract, classify, and cast text into structured Python objects.
LM Format Enforcer★ 2kA library that enforces an output format such as JSON schema or regex by filtering the tokens an LLM is allowed to generate at each step.
XGrammar★ 1.8kGrammar-constrained decoding that guarantees structurally correct LLM output