AI/TLDR

Instructor

Reliable structured outputs from any LLM, validated with Pydantic

Overview

Instructor is a library for getting structured, validated data out of large language models. You define the shape you want as a Pydantic model, pass it as a response_model, and Instructor returns a typed object instead of raw text you have to parse yourself. It is built on Pydantic, so you get validation, type hints, and IDE autocompletion for the result.

It is aimed at developers who call an LLM to extract or classify information and need the output to match a fixed schema every time. Instead of writing JSON schemas by hand, parsing responses, and handling validation errors, you write one model and let the library handle the rest. When a response fails validation, Instructor retries the call automatically with the error attached.

As a structured-output tool, it wraps the provider client rather than replacing it. The same code works across OpenAI, Anthropic, Google, Ollama, and others through a single from_provider interface, and the project ships SDKs for Python, TypeScript, Ruby, Go, Elixir, and Rust.

What it does

  • Define output shape as a Pydantic model and get a validated, typed object back
  • One from_provider interface works across OpenAI, Anthropic, Google, Ollama, Groq, and more
  • Automatic retries that re-send the validation error when output fails to validate
  • Streaming support via Partial[Model] to receive fields as they are generated
  • Extracts nested and list-based structures, not just flat objects
  • Available in Python, TypeScript, Ruby, Go, Elixir, and Rust

Getting started

Install the package, define a model for the data you want, and call the client with response_model set to that model.

Install Instructor

Install from PyPI with pip, or add it with your package manager.

bashbash
pip install instructor

Extract structured data

Define a Pydantic model, create a client with from_provider, and pass the model as response_model. The result is a validated, typed object.

pythonpython
import instructor
from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int

client = instructor.from_provider("openai/gpt-4o-mini")
user = client.chat.completions.create(
    response_model=User,
    messages=[{"role": "user", "content": "John is 25 years old"}],
)

print(user)  # User(name='John', age=25)

Retry on validation failure

Add validators to your model and set max_retries. Instructor re-sends the call with the error message when validation fails.

pythonpython
from pydantic import BaseModel, field_validator

class User(BaseModel):
    name: str
    age: int

    @field_validator('age')
    def validate_age(cls, v):
        if v < 0:
            raise ValueError('Age must be positive')
        return v

user = client.chat.completions.create(
    response_model=User,
    messages=[{"role": "user", "content": "..."}],
    max_retries=3,
)

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Pull structured fields (name, price, in-stock status) out of free-form product or document text
  • Build extraction pipelines that must return data matching a fixed schema on every call
  • Swap LLM providers without rewriting parsing code, since the same model and call work across providers
  • Stream partial results into a UI as the model fills in each field

How Instructor compares

Instructor alongside other open-source structured output tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Guidance★ 21.5kA programming model that interleaves generation, prompting, and control logic to constrain output and enforce formats like JSON or regex patterns.
Outlines★ 14kA library for structured generation that constrains an LLM's token output to match a JSON schema, regex, or grammar so the result is always valid.
Instructor★ 13.2kReliable structured outputs from any LLM, validated with Pydantic
BAML★ 8.4kA domain-specific language for defining LLM functions with typed schemas, parsing flexible model output into reliable structured data across many languages.
Marvin★ 6.2kA Python toolkit from Prefect for turning LLM calls into typed functions that extract, classify, and cast text into structured Python objects.
LM Format Enforcer★ 2kA library that enforces an output format such as JSON schema or regex by filtering the tokens an LLM is allowed to generate at each step.
XGrammar★ 1.8kA fast, portable engine for grammar-constrained decoding that guarantees LLM output follows a given structure, used inside many inference servers.