LM Format Enforcer

Force an LLM's output to match a JSON Schema or regex by filtering its tokens

github.com/noamgat/lm-format-enforcer★ 2k

Overview

LM Format Enforcer is a Python library that makes a language model stick to a required output format, such as a JSON Schema or a regular expression. Instead of relying on prompt wording and hoping the model complies, it filters the tokens the model is allowed to produce at each step, so the result always matches the format you asked for.

It is aimed at developers who need reliable structured output from local or self-hosted models and are tired of parsing and re-trying malformed responses. It works with any Python language model and tokenizer, and already integrates with transformers, LangChain, LlamaIndex, llama.cpp, vLLM, Haystack, NVIDIA TensorRT-LLM, and ExLlamaV2.

As a structured-output tool, it slots into your existing generation pipeline rather than replacing it. It still lets the model choose whitespace and field ordering within a JSON schema, which keeps the text natural and reduces hallucinations while guaranteeing the shape of the output.

What it does

Enforces JSON Schema, schemaless JSON mode, and regular-expression formats
Works with any Python language model and tokenizer; integrates with transformers, LangChain, LlamaIndex, llama.cpp, vLLM, Haystack, TensorRT-LLM, and ExLlamaV2
Supports batched generation and beam search, filtering tokens per input or per beam
Handles required and optional fields, plus nested objects, arrays, and dictionaries in JSON schemas
Lets the model control whitespace and field ordering, reducing hallucinations
Drops into existing pipelines without changing the high-level generation loop

Getting started

Install the package, then build a parser from your schema and pass it into your generation pipeline as a token filter. The example below uses Hugging Face transformers, straight from the README.

Install the package

Install from PyPI. For the transformers example you also need transformers, torch, and the model's runtime dependencies.

bashbash

pip install lm-format-enforcer

Define your output schema

Describe the shape you want with a Pydantic model.

pythonpython

from pydantic import BaseModel

class AnswerFormat(BaseModel):
    first_name: str
    last_name: str
    year_of_birth: int
    num_seasons_in_nba: int

Build a token filter and run generation

Create a JsonSchemaParser, build a prefix function from it and your tokenizer, then pass that function to the transformers pipeline so the output matches the schema.

pythonpython

from lmformatenforcer import JsonSchemaParser
from lmformatenforcer.integrations.transformers import build_transformers_prefix_allowed_tokens_fn
from transformers import pipeline

hf_pipeline = pipeline('text-generation', model='TheBloke/Llama-2-7b-Chat-GPTQ', device_map='auto')
prompt = f'Here is information about Michael Jordan in the following json schema: {AnswerFormat.schema_json()} :\n'

parser = JsonSchemaParser(AnswerFormat.schema())
prefix_function = build_transformers_prefix_allowed_tokens_fn(hf_pipeline.tokenizer, parser)

output_dict = hf_pipeline(prompt, prefix_allowed_tokens_fn=prefix_function)
result = output_dict[0]['generated_text'][len(prompt):]
print(result)

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Extracting reliable JSON records from a local model for downstream parsing without retry loops
Serving structured output from a vLLM OpenAI-compatible server using the lm-format-enforcer guided-decoding backend
Constraining model output to a regular expression, for example a date, phone number, or enum value
Producing valid responses for nested schemas with optional fields inside transformers, llama.cpp, or LlamaIndex pipelines

How LM Format Enforcer compares

LM Format Enforcer alongside other open-source structured output tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Guidance	★ 21.5k	A programming model that interleaves generation, prompting, and control logic to constrain output and enforce formats like JSON or regex patterns.
Outlines	★ 14k	A library for structured generation that constrains an LLM's token output to match a JSON schema, regex, or grammar so the result is always valid.
Instructor	★ 13.2k	A library that wraps an LLM client to return data validated against a schema, retrying automatically on invalid output, with SDKs in several languages.
BAML	★ 8.4k	A domain-specific language for defining LLM functions with typed schemas, parsing flexible model output into reliable structured data across many languages.
Marvin	★ 6.2k	A Python toolkit from Prefect for turning LLM calls into typed functions that extract, classify, and cast text into structured Python objects.
LM Format Enforcer	★ 2k	Force an LLM's output to match a JSON Schema or regex by filtering its tokens
XGrammar	★ 1.8k	A fast, portable engine for grammar-constrained decoding that guarantees LLM output follows a given structure, used inside many inference servers.

// Overview

// What it does

// Getting started