AI/TLDR

What Are Structured Outputs in LLMs?

Understand how JSON mode and schema-constrained generation let you treat an LLM's response as reliable structured data your code can directly consume.

INTERMEDIATE11 MIN READUPDATED 2026-06-12

In plain English

When you ask an LLM a question in a chat window, a natural-language answer is perfect. But when your code needs to consume the answer — parse it, store it in a database, pass it to another function — a paragraph of prose is a nightmare. You need a predictable shape: a JSON object with specific keys, specific value types, and no surprises.

Structured outputs is the collective name for features that force the model to emit text that fits a machine-readable format you specify. The most common target is JSON. Instead of hoping the model writes valid JSON on its own (it usually does, but not always), you hand the API a schema — a precise description of every field and its type — and the API guarantees the response matches it.

Think of it like ordering a sandwich at a counter with a form versus shouting your order at a chef. With the form, every box must be filled in correctly before the order goes through. With the shout, you might get close — but sometimes the chef hears "no onions" as "extra onions" and your code breaks at 2 AM.

Why it matters

LLMs are excellent at reasoning and generating text, but their outputs are naturally unstructured. When an LLM drives application logic — extracting data from a document, classifying support tickets, generating form values — you need to parse its output before anything else happens. Without structured outputs, that parsing is fragile.

Common failure modes without structured outputs include: missing keys the model "forgot", hallucinated enum values that aren't in your allowlist, nested fields that appear sometimes but not always, and trailing prose after the closing brace that breaks JSON.parse. Each requires defensive code and retries that cost time and money.

The business impact is real. Predictable output schemas eliminate entire classes of runtime errors in production pipelines, reduce retry costs, and make it safe to skip defensive parsing boilerplate. Teams at Shopify, Zapier, and Retool reported 90%+ reductions in API parsing errors after adopting structured outputs, according to OpenAI's case studies.

Structured outputs also unlock a whole category of use cases: data extraction pipelines, automated form filling, multi-step agent workflows that pass structured results between steps, and function calling where tool arguments need to match an exact interface. In all of these, a schema guarantee is the difference between a reliable product and a fragile prototype.

How it works

There are two complementary mechanisms used to produce structured outputs: constrained decoding on the server side, and schema validation on the client side. Understanding both helps you pick the right approach and debug failures.

Constrained decoding

At each step, an LLM produces a probability distribution over all possible next tokens. Normally the sampler picks freely from that distribution. With constrained decoding, the inference engine masks out any token that would make the output violate the schema. If the schema says the next value must be true or false, tokens like "hello" are zeroed out of the distribution before sampling. The model never even gets to try an invalid value.

This is implemented via a formal grammar derived from the JSON schema. The grammar tracks what characters are valid at every position in the output — after {"status": the grammar knows only a string or number can follow, so no other token is ever sampled. The result is that the output is structurally valid by construction, not by post-hoc filtering.

API surface: response_format

On the API side, you pass your schema in the response_format parameter. OpenAI's approach (available on GPT-4o and newer) uses type: "json_schema" with a full JSON Schema object:

OpenAI Python SDK — schema-constrained outputpython
from openai import OpenAI
import json

client = OpenAI()  # reads OPENAI_API_KEY from env

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Extract the event details from the user message."},
        {"role": "user", "content": "Team standup on Friday at 10am in the Hudson room."}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "Event",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "title":    {"type": "string"},
                    "day":      {"type": "string"},
                    "time":     {"type": "string"},
                    "location": {"type": "string"}
                },
                "required": ["title", "day", "time", "location"],
                "additionalProperties": False
            }
        }
    }
)

event = json.loads(response.choices[0].message.content)
print(event)  # {"title": "Team standup", "day": "Friday", ...}

Pydantic and Zod: schema from your types

Writing JSON Schema by hand is tedious and error-prone. The official SDKs let you declare the schema using the type system you're already using. In Python, pass a Pydantic model class directly; the SDK serialises it to JSON Schema automatically. In TypeScript, use Zod. The output comes back already parsed and type-safe — no JSON.parse needed:

OpenAI + Pydantic — type-safe structured outputpython
from openai import OpenAI
from pydantic import BaseModel

client = OpenAI()

class Event(BaseModel):
    title: str
    day: str
    time: str
    location: str

# parse() sends the schema and parses the response in one call.
response = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Extract event details."},
        {"role": "user",   "content": "Team standup on Friday at 10am in the Hudson room."}
    ],
    response_format=Event,
)

event: Event = response.choices[0].message.parsed
print(event.title, event.day)  # already typed — no dict key access needed

Anthropic: tool use as structured output

Anthropic's Claude API supports structured outputs via two routes. The first is forcing a tool call: define a tool whose input schema is your desired JSON shape, then set tool_choice to force the model to call it. The tool arguments are the structured output — always valid JSON, always matching the schema. The second route is a dedicated beta header (anthropic-beta: structured-outputs-2025-11-13) that enables response_format similar to OpenAI's approach for direct JSON output.

Anthropic Python SDK — forced tool call as structured outputpython
import anthropic, json

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env

extract_tool = {
    "name": "extract_event",
    "description": "Extract event details from the message.",
    "input_schema": {
        "type": "object",
        "properties": {
            "title":    {"type": "string"},
            "day":      {"type": "string"},
            "time":     {"type": "string"},
            "location": {"type": "string"}
        },
        "required": ["title", "day", "time", "location"]
    }
}

message = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=1024,
    tools=[extract_tool],
    tool_choice={"type": "tool", "name": "extract_event"},  # force the call
    messages=[{"role": "user", "content": "Team standup on Friday at 10am in the Hudson room."}]
)

event = message.content[0].input  # already a dict, not raw JSON string
print(event["title"], event["day"])

Provider comparison and limitations

Every major LLM provider supports some form of structured output, but the guarantees and syntax differ. Knowing the gaps prevents surprises in production.

ProviderFeature nameMechanismSchema formatStrict guarantee?
OpenAI GPT-4o+Structured OutputsConstrained decodingJSON Schema via response_formatYes, with strict: true
Anthropic ClaudeTool use / structured outputs betaConstrained decoding (beta) or forced tool callJSON Schema as tool inputYes (beta) / Yes (tool use)
Google GeminiControlled generationConstrained decodingJSON Schema via response_mime_typeYes
MistralJSON modeSampling guidanceresponse_format: json_objectValid JSON, no schema
CohereStructured OutputsConstrained decodingJSON Schema via response_formatYes

Schema constraints to know

Not all JSON Schema features are supported by every provider's constrained decoder. Common limitations to watch out for:

  • additionalProperties: false is usually required — without it, OpenAI's strict mode won't activate.
  • Recursive schemas (e.g. a tree node referencing itself) may be rejected or have depth limits.
  • anyOf / oneOf / if-then-else are partially supported; check provider docs before relying on them.
  • All properties must be listed under required when using OpenAI's strict: true — optional fields must use a union with null instead.
  • $ref and $defs for shared sub-schemas work on OpenAI but may not be supported elsewhere.

Structured outputs vs. function calling

The two features overlap but serve different purposes. Structured outputs (direct response_format) produce a JSON response in the content field — useful when you just want data back. Function calling (tool use) models the LLM as an agent that calls a function with structured arguments — useful when the model decides which action to take and with what parameters. Under the hood, many providers implement structured outputs using the same constrained decoding engine as function calling. The conceptual split is: do you want data extraction (structured outputs) or action dispatch (function calling)?

Going deeper

Performance tradeoffs

Constrained decoding adds a small overhead to each token step — the grammar automaton must be evaluated to compute the token mask. For simple flat schemas this is negligible. For deeply nested schemas with many enum values, the automaton can become large and first-token latency increases measurably. In practice, the overhead is far smaller than the cost of a retry caused by invalid output; the tradeoff almost always favours schema enforcement.

When structured outputs hurt reasoning

Forcing tight structure can cut off the model's ability to reason through a problem. A model that must immediately produce {"answer": "..."} has no room to think step by step before committing. The established pattern is chain-of-thought first, schema last: ask the model to reason in a free-text reasoning field (or a separate message), then produce the structured answer. Some schemas include a reasoning string field explicitly for this. The chain-of-thought pattern pairs naturally with structured outputs precisely because you can make thinking itself a schema field.

Streaming and partial JSON

You can combine streaming with structured outputs — the model emits the JSON token by token. The tradeoff is that a partially-received JSON object is not valid JSON and cannot be parsed until the stream completes. Some libraries (like instructor for Python) implement incremental JSON parsers that emit partially-complete objects as fields arrive, enabling progressive UI updates. This is an advanced pattern; for most use cases, collect the full stream before parsing.

The instructor library

For Python developers, the open-source instructor library wraps the OpenAI and Anthropic clients to make structured outputs even more ergonomic. It handles schema serialisation, sends the request, parses the response, and automatically retries with validation error feedback if the model's output doesn't match (useful for providers without native constrained decoding). It's widely adopted and a good starting point if you're doing heavy data extraction work.

Schema design best practices

  • Keep schemas flat when possible — nested objects multiply the chance of constrained-decoding edge cases.
  • Use enums for categorical fields — enums are fully constrained; free strings can still hallucinate unexpected values.
  • Add description to every field — the model reads field descriptions as instructions; they steer quality even with constrained decoding.
  • Test with adversarial inputs — complex schemas may trigger latency spikes or refusals; probe them before production.
  • Version your schemas — as your application evolves, old stored outputs may not match a new schema; plan for migration.

FAQ

What is the difference between JSON mode and structured outputs?

JSON mode (the older "response_format": {"type": "json_object"}) tells the model to produce syntactically valid JSON, but doesn't constrain what keys or types appear. Structured outputs go further: you supply a JSON Schema, and the API guarantees the response matches it exactly — correct field names, correct types, no missing required keys. Structured outputs are strictly stronger.

How do I force an LLM to return JSON?

With OpenAI, pass response_format={"type": "json_schema", "json_schema": {...}} in your API call with strict: true. With Anthropic, either define a tool whose input schema is your desired shape and force the model to call it with tool_choice, or use the structured-outputs beta header. With Gemini, set response_mime_type to application/json and provide a schema. All three approaches guarantee valid, schema-matching JSON.

Does structured output work with Pydantic or Zod?

Yes. The OpenAI Python SDK has a client.beta.chat.completions.parse() method that accepts a Pydantic model class directly — it serialises the class to JSON Schema, sends the request, and returns the response already parsed as a typed Pydantic object. The TypeScript SDK does the same with Zod schemas. This is the recommended path for production use.

Can structured outputs handle optional fields?

Yes, but the syntax is counterintuitive. With OpenAI's strict: true, every property must be in the required array. To make a field optional in practice, declare its type as ["string", "null"] (a union with null) — then the model can emit null when the value is absent. Leaving a field out of required is not allowed in strict mode.

Does using structured outputs hurt the model's reasoning quality?

It can, if you force the model to emit a tightly-structured answer immediately. The fix is to include a reasoning field (type: string) as the first property in your schema so the model can think before committing to the final answer fields. Alternatively, use a two-call pattern: first call generates free-text reasoning, second call produces the structured extraction from that reasoning.

Can I stream structured outputs token by token?

Yes, technically — the model emits the JSON characters one token at a time. But a partially-received JSON string is invalid and unparseable until the stream completes. For most use cases you should wait for the full response. If you need progressive UI updates, use a library like instructor that implements incremental JSON parsing, or design your schema so meaningful fields come early in the object.

Further reading