AI/TLDR

Guidance

A programming model for steering LLMs with constrained, structured generation

Overview

Guidance is a Python library from Microsoft for steering language models. Instead of writing one big prompt and hoping the model returns the right format, you build the prompt and the generation together in code, mixing fixed text with gen() and select() calls that produce model output.

It is aimed at developers who need the output to follow a specific shape, such as a number, a choice from a list, or a string that matches a regular expression or context-free grammar. Because the constraints are applied during generation, you get valid output without extra retries or post-processing.

As a structured-output and orchestration tool, Guidance works across several backends, including Transformers, llama.cpp, and OpenAI. You can also interleave control logic such as conditionals, loops, and tool use directly with generation.

What it does

  • Pythonic interface where model objects are immutable and you append prompt and generation with the += operator
  • Constrained generation with regular expressions and context-free grammars to guarantee output syntax
  • select() to force the model to pick from a known list of options
  • Roles via system(), user(), and assistant() context managers
  • Reusable @guidance functions that compose like the built-in gen() and select()
  • Offline grammar debugging with grammar.match() and the Mock model, with no API calls

Getting started

Install Guidance from PyPI, then build a chat-style prompt and capture constrained output. You also need a backend such as Transformers for the model you want to run.

Install

Guidance is available through PyPI and supports backends like Transformers, llama.cpp, and OpenAI. Install it with pip.

bashbash
pip install guidance

Generate text with a model

Load a model through a backend and build the prompt with role context managers, then capture the generated text by naming the gen() call.

pythonpython
from guidance import system, user, assistant, gen
from guidance.models import Transformers

phi_lm = Transformers("microsoft/Phi-4-mini-instruct")
lm = phi_lm

with system():
    lm += "You are a helpful assistant"

with user():
    lm += "Hello. What is your name?"

with assistant():
    lm += gen(name="lm_response", max_tokens=20)

print(lm['lm_response'])

Constrain the output

Use a regex to force a numeric answer, or select() to limit the model to a fixed set of choices.

pythonpython
from guidance import select

lm = phi_lm

with user():
    lm += "What is the capital of Sweden? Answer A, B, C, or D."

with assistant():
    lm += select(["A", "B", "C", "D"], name="model_selection")

print(lm['model_selection'])

Debug grammars offline

Validate candidate strings against a grammar and run it with the Mock model, with no model API calls.

pythonpython
from guidance import gen
from guidance.models import Mock

grammar = "expr=" + gen(regex=r"\d+([+*]\d+)*", name="expr")

assert grammar.match("expr=12+7*3") is not None
assert grammar.match("expr=12+*3") is None

lm = Mock(b"<s>expr=12+7*3")
lm += grammar
print(lm["expr"])  # 12+7*3

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Forcing model output into a strict format such as JSON, a single number, or a value matching a regex
  • Building multiple-choice or classification flows where the answer must come from a known list
  • Interleaving control flow like loops and conditionals with generation, for example answering many questions in one pass
  • Iterating on and testing constraint grammars offline before spending tokens on real model calls

How Guidance compares

Guidance alongside other open-source structured output tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Guidance★ 21.5kA programming model for steering LLMs with constrained, structured generation
Outlines★ 14kA library for structured generation that constrains an LLM's token output to match a JSON schema, regex, or grammar so the result is always valid.
Instructor★ 13.2kA library that wraps an LLM client to return data validated against a schema, retrying automatically on invalid output, with SDKs in several languages.
BAML★ 8.4kA domain-specific language for defining LLM functions with typed schemas, parsing flexible model output into reliable structured data across many languages.
Marvin★ 6.2kA Python toolkit from Prefect for turning LLM calls into typed functions that extract, classify, and cast text into structured Python objects.
LM Format Enforcer★ 2kA library that enforces an output format such as JSON schema or regex by filtering the tokens an LLM is allowed to generate at each step.
XGrammar★ 1.8kA fast, portable engine for grammar-constrained decoding that guarantees LLM output follows a given structure, used inside many inference servers.