Instructor

Reliable structured outputs from any LLM, validated with Pydantic

github.com/567-labs/instructor★ 13.2k python.useinstructor.com

Overview

Instructor is a library for getting structured, validated data out of large language models. You define the shape you want as a Pydantic model, pass it as a response_model, and Instructor returns a typed object instead of raw text you have to parse yourself. It is built on Pydantic, so you get validation, type hints, and IDE autocompletion for the result.

It is aimed at developers who call an LLM to extract or classify information and need the output to match a fixed schema every time. Instead of writing JSON schemas by hand, parsing responses, and handling validation errors, you write one model and let the library handle the rest. When a response fails validation, Instructor retries the call automatically with the error attached.

As a structured-output tool, it wraps the provider client rather than replacing it. The same code works across OpenAI, Anthropic, Google, Ollama, and others through a single from_provider interface, and the project ships SDKs for Python, TypeScript, Ruby, Go, Elixir, and Rust.

What it does

Define output shape as a Pydantic model and get a validated, typed object back
One from_provider interface works across OpenAI, Anthropic, Google, Ollama, Groq, and more
Automatic retries that re-send the validation error when output fails to validate
Streaming support via Partial[Model] to receive fields as they are generated
Extracts nested and list-based structures, not just flat objects
Available in Python, TypeScript, Ruby, Go, Elixir, and Rust

Getting started

Install the package, define a model for the data you want, and call the client with response_model set to that model.

Install Instructor

Install from PyPI with pip, or add it with your package manager.

bashbash

pip install instructor

Extract structured data

Define a Pydantic model, create a client with from_provider, and pass the model as response_model. The result is a validated, typed object.

pythonpython

import instructor
from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int

client = instructor.from_provider("openai/gpt-4o-mini")
user = client.chat.completions.create(
    response_model=User,
    messages=[{"role": "user", "content": "John is 25 years old"}],
)

print(user)  # User(name='John', age=25)

Retry on validation failure

Add validators to your model and set max_retries. Instructor re-sends the call with the error message when validation fails.

pythonpython

from pydantic import BaseModel, field_validator

class User(BaseModel):
    name: str
    age: int

    @field_validator('age')
    def validate_age(cls, v):
        if v < 0:
            raise ValueError('Age must be positive')
        return v

user = client.chat.completions.create(
    response_model=User,
    messages=[{"role": "user", "content": "..."}],
    max_retries=3,
)

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Pull structured fields (name, price, in-stock status) out of free-form product or document text
Build extraction pipelines that must return data matching a fixed schema on every call
Swap LLM providers without rewriting parsing code, since the same model and call work across providers
Stream partial results into a UI as the model fills in each field

How Instructor compares

Instructor alongside other open-source structured output tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Guidance	★ 21.5k	A programming model that interleaves generation, prompting, and control logic to constrain output and enforce formats like JSON or regex patterns.
Outlines	★ 14k	A library for structured generation that constrains an LLM's token output to match a JSON schema, regex, or grammar so the result is always valid.
Instructor	★ 13.2k	Reliable structured outputs from any LLM, validated with Pydantic
BAML	★ 8.4k	A domain-specific language for defining LLM functions with typed schemas, parsing flexible model output into reliable structured data across many languages.
Marvin	★ 6.2k	A Python toolkit from Prefect for turning LLM calls into typed functions that extract, classify, and cast text into structured Python objects.
LM Format Enforcer	★ 2k	A library that enforces an output format such as JSON schema or regex by filtering the tokens an LLM is allowed to generate at each step.
XGrammar	★ 1.8k	A fast, portable engine for grammar-constrained decoding that guarantees LLM output follows a given structure, used inside many inference servers.

// Overview

// What it does

// Getting started