Overview
Guidance is a Python library from Microsoft for steering language models. Instead of writing one big prompt and hoping the model returns the right format, you build the prompt and the generation together in code, mixing fixed text with gen() and select() calls that produce model output.
It is aimed at developers who need the output to follow a specific shape, such as a number, a choice from a list, or a string that matches a regular expression or context-free grammar. Because the constraints are applied during generation, you get valid output without extra retries or post-processing.
As a structured-output and orchestration tool, Guidance works across several backends, including Transformers, llama.cpp, and OpenAI. You can also interleave control logic such as conditionals, loops, and tool use directly with generation.
What it does
- Pythonic interface where model objects are immutable and you append prompt and generation with the += operator
- Constrained generation with regular expressions and context-free grammars to guarantee output syntax
- select() to force the model to pick from a known list of options
- Roles via system(), user(), and assistant() context managers
- Reusable @guidance functions that compose like the built-in gen() and select()
- Offline grammar debugging with grammar.match() and the Mock model, with no API calls
Getting started
Install Guidance from PyPI, then build a chat-style prompt and capture constrained output. You also need a backend such as Transformers for the model you want to run.
Install
Guidance is available through PyPI and supports backends like Transformers, llama.cpp, and OpenAI. Install it with pip.
pip install guidanceGenerate text with a model
Load a model through a backend and build the prompt with role context managers, then capture the generated text by naming the gen() call.
from guidance import system, user, assistant, gen
from guidance.models import Transformers
phi_lm = Transformers("microsoft/Phi-4-mini-instruct")
lm = phi_lm
with system():
lm += "You are a helpful assistant"
with user():
lm += "Hello. What is your name?"
with assistant():
lm += gen(name="lm_response", max_tokens=20)
print(lm['lm_response'])Constrain the output
Use a regex to force a numeric answer, or select() to limit the model to a fixed set of choices.
from guidance import select
lm = phi_lm
with user():
lm += "What is the capital of Sweden? Answer A, B, C, or D."
with assistant():
lm += select(["A", "B", "C", "D"], name="model_selection")
print(lm['model_selection'])Debug grammars offline
Validate candidate strings against a grammar and run it with the Mock model, with no model API calls.
from guidance import gen
from guidance.models import Mock
grammar = "expr=" + gen(regex=r"\d+([+*]\d+)*", name="expr")
assert grammar.match("expr=12+7*3") is not None
assert grammar.match("expr=12+*3") is None
lm = Mock(b"<s>expr=12+7*3")
lm += grammar
print(lm["expr"]) # 12+7*3Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Forcing model output into a strict format such as JSON, a single number, or a value matching a regex
- Building multiple-choice or classification flows where the answer must come from a known list
- Interleaving control flow like loops and conditionals with generation, for example answering many questions in one pass
- Iterating on and testing constraint grammars offline before spending tokens on real model calls
How Guidance compares
Guidance alongside other open-source structured output tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Guidance | ★ 21.5k | A programming model for steering LLMs with constrained, structured generation |
| Outlines | ★ 14k | A library for structured generation that constrains an LLM's token output to match a JSON schema, regex, or grammar so the result is always valid. |
| Instructor | ★ 13.2k | A library that wraps an LLM client to return data validated against a schema, retrying automatically on invalid output, with SDKs in several languages. |
| BAML | ★ 8.4k | A domain-specific language for defining LLM functions with typed schemas, parsing flexible model output into reliable structured data across many languages. |
| Marvin | ★ 6.2k | A Python toolkit from Prefect for turning LLM calls into typed functions that extract, classify, and cast text into structured Python objects. |
| LM Format Enforcer | ★ 2k | A library that enforces an output format such as JSON schema or regex by filtering the tokens an LLM is allowed to generate at each step. |
| XGrammar | ★ 1.8k | A fast, portable engine for grammar-constrained decoding that guarantees LLM output follows a given structure, used inside many inference servers. |