Guidance

A programming model for steering LLMs with constrained, structured generation

Overview

Guidance is a Python library from Microsoft for steering language models. Instead of writing one big prompt and hoping the model returns the right format, you build the prompt and the generation together in code, mixing fixed text with gen() and select() calls that produce model output.

It is aimed at developers who need the output to follow a specific shape, such as a number, a choice from a list, or a string that matches a regular expression or context-free grammar. Because the constraints are applied during generation, you get valid output without extra retries or post-processing.

As a structured-output and orchestration tool, Guidance works across several backends, including Transformers, llama.cpp, and OpenAI. You can also interleave control logic such as conditionals, loops, and tool use directly with generation.

What it does

Pythonic interface where model objects are immutable and you append prompt and generation with the += operator
Constrained generation with regular expressions and context-free grammars to guarantee output syntax
select() to force the model to pick from a known list of options
Roles via system(), user(), and assistant() context managers
Reusable @guidance functions that compose like the built-in gen() and select()
Offline grammar debugging with grammar.match() and the Mock model, with no API calls

Getting started

Install Guidance from PyPI, then build a chat-style prompt and capture constrained output. You also need a backend such as Transformers for the model you want to run.

Install

Guidance is available through PyPI and supports backends like Transformers, llama.cpp, and OpenAI. Install it with pip.

bashbash

pip install guidance

Generate text with a model

Load a model through a backend and build the prompt with role context managers, then capture the generated text by naming the gen() call.

pythonpython

from guidance import system, user, assistant, gen
from guidance.models import Transformers

phi_lm = Transformers("microsoft/Phi-4-mini-instruct")
lm = phi_lm

with system():
    lm += "You are a helpful assistant"

with user():
    lm += "Hello. What is your name?"

with assistant():
    lm += gen(name="lm_response", max_tokens=20)

print(lm['lm_response'])

Constrain the output

Use a regex to force a numeric answer, or select() to limit the model to a fixed set of choices.

pythonpython

from guidance import select

lm = phi_lm

with user():
    lm += "What is the capital of Sweden? Answer A, B, C, or D."

with assistant():
    lm += select(["A", "B", "C", "D"], name="model_selection")

print(lm['model_selection'])

Debug grammars offline

Validate candidate strings against a grammar and run it with the Mock model, with no model API calls.

pythonpython

from guidance import gen
from guidance.models import Mock

grammar = "expr=" + gen(regex=r"\d+([+*]\d+)*", name="expr")

assert grammar.match("expr=12+7*3") is not None
assert grammar.match("expr=12+*3") is None

lm = Mock(b"<s>expr=12+7*3")
lm += grammar
print(lm["expr"])  # 12+7*3

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Forcing model output into a strict format such as JSON, a single number, or a value matching a regex
Building multiple-choice or classification flows where the answer must come from a known list
Interleaving control flow like loops and conditionals with generation, for example answering many questions in one pass
Iterating on and testing constraint grammars offline before spending tokens on real model calls

How Guidance compares

Guidance alongside other open-source structured output tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Guidance	★ 21.5k	A programming model for steering LLMs with constrained, structured generation
Outlines	★ 14k	A library for structured generation that constrains an LLM's token output to match a JSON schema, regex, or grammar so the result is always valid.
Instructor	★ 13.2k	A library that wraps an LLM client to return data validated against a schema, retrying automatically on invalid output, with SDKs in several languages.
BAML	★ 8.4k	A domain-specific language for defining LLM functions with typed schemas, parsing flexible model output into reliable structured data across many languages.
Marvin	★ 6.2k	A Python toolkit from Prefect for turning LLM calls into typed functions that extract, classify, and cast text into structured Python objects.
LM Format Enforcer	★ 2k	A library that enforces an output format such as JSON schema or regex by filtering the tokens an LLM is allowed to generate at each step.
XGrammar	★ 1.8k	A fast, portable engine for grammar-constrained decoding that guarantees LLM output follows a given structure, used inside many inference servers.

// Overview

// What it does

// Getting started

Install

Generate text with a model

Constrain the output

Debug grammars offline

// When to use it

// How Guidance compares

Overview

What it does

Getting started

When to use it

How Guidance compares