In plain English
You want an LLM to give you JSON your code can reliably parse. There are three distinct mechanisms for doing that — and they are not interchangeable. Each comes with a different level of guarantee, a different API surface, and a different intended job. Mixing them up is one of the most common sources of subtle production bugs in LLM-powered apps.
JSON mode is the oldest option. You tell the model "please respond in JSON" and it will produce syntactically valid JSON — curly braces balanced, strings quoted, no trailing commas. But it says nothing about which keys appear or what types they have. The schema is entirely up to the model's best guess from the prompt.
Structured outputs is the modern upgrade. You supply a full JSON Schema and the API guarantees the response matches it — every required field present, every type correct, no extra keys, no invented enum values. Under the hood the inference engine compiles your schema into a grammar and masks out invalid tokens at sampling time, so violations are impossible by construction.
Function calling (also called tool use) is built for a different job: letting the model trigger actions in your code. You describe a set of functions the model is allowed to call, and when it decides to invoke one, the API returns a structured argument payload — not the final answer. The primary guarantee is about action selection and argument shape, not about returning data to the user.
A helpful analogy: JSON mode is like asking a waiter to "write down my order in some kind of list format" — they'll probably do it, but you can't bet your order-processing system on the exact shape. Structured outputs is like handing them a pre-printed form with labeled boxes — the waiter physically cannot skip a required field. Function calling is like giving the kitchen staff a button panel — when the waiter presses the right button, the kitchen fires a specific action with whatever parameters are on the slip.
Why it matters
Choosing the wrong mechanism causes real production failures. Teams that ship with JSON mode discover in production that the model occasionally adds a "notes" key they didn't ask for, omits a required field when the context is long, or returns an enum value that isn't in their allowlist. Every one of those cases means a try/catch, a retry, and added latency — or worse, silent data corruption.
Choosing function calling when you just want structured data adds complexity without benefit: you have to define a dummy tool, handle the tool-call turn in your message loop, and reassemble the final answer — when response_format with a schema would have done the job in one round trip.
Going the other way — using response_format when you need the model to decide which of several actions to take — means you lose the model's ability to route to the right tool and pass typed arguments. You end up parsing an action name out of a JSON blob and then dispatching yourself, which is exactly what function calling already does for you.
How each mechanism works
All three mechanisms sit on the same underlying token-generation pipeline. What differs is how much control you hand back to the inference engine over what tokens are valid at each position.
JSON mode: syntactic validity only
JSON mode (response_format: {type: "json_object"} in OpenAI's API) signals to the model via the system prompt and a lightweight decode-time constraint that the output must be parseable JSON. The constraint is shallow: it ensures the overall structure is a JSON value, but does not enforce any particular set of keys, value types, or nesting depth. On older or less capable models, the model may still omit keys or hallucinate fields the prompt didn't request.
Structured outputs: grammar-based schema enforcement
Structured outputs (response_format: {type: "json_schema", json_schema: {...}, strict: true}) compiles your JSON Schema into a token-level grammar. At every sampling step, the inference engine masks out tokens that would violate the schema — if the current position expects a boolean, only true and false tokens are allowed. The result is that schema violations are mathematically impossible: the model can never emit a response that doesn't parse or doesn't match. OpenAI measured a perfect 100% schema-adherence rate on gpt-4o-2024-08-06 with structured outputs enabled, compared to under 40% for the earlier gpt-4-0613 without it.
Function calling: action routing with typed arguments
Function calling (tools: [...]) works differently. You supply one or more function definitions (name, description, parameter schema). The model reads the conversation and decides whether to respond normally or to call one of your functions. If it calls a function, the API returns a special message with finish_reason: "tool_calls" containing the function name and a JSON argument payload. Your code must handle this turn: call the actual function, add the result to the message history, and send the conversation back to the model to get the final answer. Setting strict: true inside the function definition activates the same grammar-based enforcement as structured outputs — so the arguments will exactly match the parameter schema.
- Syntactically valid JSON
- No schema enforcement
- Keys can be missing or invented
- One API round trip
- Works on all JSON-capable models
- Schema-exact JSON, guaranteed
- Grammar masks invalid tokens
- All required fields always present
- One API round trip
- Requires modern model versions
- Typed argument payload
- Optional strict mode for args
- Model decides which tool to call
- Multi-turn: needs a follow-up call
- Built for action dispatch, not data return
Side-by-side in code
The three mechanisms look similar in code but carry very different semantics. The examples below use the OpenAI Python SDK to show the key differences at the call site.
JSON mode
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Return info about Paris as JSON."}],
response_format={"type": "json_object"} # valid JSON guaranteed, schema is not
)
# Still need to check that the keys you expect are actually present
import json
data = json.loads(response.choices[0].message.content)Structured outputs
from pydantic import BaseModel
class CityInfo(BaseModel):
name: str
country: str
population: int
capital: bool
# .parse() sends the Pydantic schema and returns a typed object — no json.loads needed
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[{"role": "user", "content": "Return info about Paris."}],
response_format=CityInfo,
)
city: CityInfo = response.choices[0].message.parsed
print(city.population) # int, guaranteedFunction calling
import json
tools = [{
"type": "function",
"function": {
"name": "get_city_info",
"description": "Look up current data for a city",
"parameters": {
"type": "object",
"properties": {
"city_name": {"type": "string"}
},
"required": ["city_name"],
"additionalProperties": False
},
"strict": True # arguments will exactly match the schema
}
}]
# First turn: model decides to call the tool
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What's the population of Paris?"}],
tools=tools,
tool_choice="auto"
)
tool_call = response.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments) # {"city_name": "Paris"}
# Your code runs the actual lookup here:
result = fetch_city_from_database(args["city_name"])
# Second turn: send the result back so the model can answer
messages = [
{"role": "user", "content": "What's the population of Paris?"},
response.choices[0].message, # the tool_call message
{"role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(result)}
]
final = client.chat.completions.create(model="gpt-4o", messages=messages)
print(final.choices[0].message.content)Which one should you use?
The simplest rule: if you are extracting or generating data for your own code to consume, use structured outputs. If you need the model to decide which action to take and pass typed arguments to that action, use function calling. Only fall back to JSON mode when you are on an older model that does not support structured outputs.
| Scenario | Best mechanism |
|---|---|
| Extract fields from a document (name, date, amount) | Structured outputs |
| Classify a support ticket into predefined categories | Structured outputs |
| Generate a form value set from a user description | Structured outputs |
| Let the model fetch current weather from your API | Function calling |
| Build an agent that can search, read, and write files | Function calling |
| Route user intent to one of several backend actions | Function calling |
| Quick prototype on gpt-3.5-turbo or an older endpoint | JSON mode |
| Extract data AND the model might also need to call a tool | Structured outputs + tools with strict: true |
Anthropic and Google have converged on the same pattern. Claude launched structured outputs in November 2025 (output_config.format for response JSON and strict: true for tool parameters) on Claude Sonnet 4.5 and Opus 4.1, using the same grammar-compilation approach. The Gemini API uses response_schema with a JSON Schema object for controlled generation and supports Pydantic/Zod schemas via the Google GenAI SDK. In all three ecosystems the underlying guarantee is the same: schema compiled to grammar, invalid tokens masked at decode time.
Parallel tool calls require extra care. When parallel_tool_calls is enabled (the default on most OpenAI models), the model may call multiple tools in one turn and may not respect strict mode correctly across all of them. OpenAI recommends setting parallel_tool_calls: false if strict argument schemas are critical and you are using a model that exhibits this behavior.
Going deeper
Once you are comfortable with the three mechanisms, the next layer of nuance is about schema design, provider differences, and streaming.
Schema design constraints for structured outputs
Not all JSON Schema features are supported in strict mode. OpenAI's grammar-based enforcement requires that additionalProperties is set to false for every object and that all fields are in required. Unsupported keywords (like if/then, not, contains) are silently ignored or cause a validation error at request time — not at generation time. Always test your schema against the provider's supported keyword list before shipping to production.
Streaming with structured outputs
Structured outputs can be streamed token-by-token just like plain completions — the schema enforcement happens on the server at sampling time, so each streamed chunk is already schema-legal. However, because JSON must be complete to parse, you cannot meaningfully JSON.parse intermediate chunks. The common patterns are: buffer the full stream and parse once at the end, or use a streaming JSON parser (like jsonstream in Node.js) that can emit partial objects as fields complete. The Pydantic/Zod .stream() helpers in the official SDKs handle this automatically.
Function calling with structured output arguments
When you add strict: true to a function definition, structured-output enforcement applies to the arguments payload of tool calls — the same grammar-based masking runs over the argument JSON at generation time. This means you get both benefits: the model routes to the right tool (function calling's job) and the arguments are schema-exact (structured outputs' guarantee). For any production agent, you should always set strict: true on function definitions for supported models.
When constrained decoding can hurt quality
Grammar-based masking is nearly free in compute terms, but very tight schemas can occasionally cause the model to produce valid-but-wrong values. If a field is constrained to an enum and none of the enum values naturally fit the context, the model is forced to pick one — it cannot say "I don't know". Mitigations: add an "unknown" or "other" member to enums where applicable, use nullable fields for optional data, and keep schemas as permissive as needed while still being as strict as useful.
FAQ
Does JSON mode guarantee the model returns the keys I asked for in the prompt?
No. JSON mode only guarantees syntactically valid JSON — balanced braces, quoted strings, no trailing commas. If you ask for name, date, and amount in the prompt, the model will usually include them, but it can omit them, rename them, or add extras. Only structured outputs (with type: "json_schema" and strict: true) enforces that every required key is present and every type is correct.
Can I use structured outputs and function calling at the same time?
Yes. You can pass both tools and response_format in the same request if you want the model to be able to call tools and its final response to be schema-constrained. More commonly, you add strict: true to function definitions so the tool-call arguments are schema-exact. These two features are complementary, not mutually exclusive.
Does function calling with strict: true give the same guarantee as structured outputs?
For the argument payload of a tool call, yes — strict: true activates the same grammar-based constrained decoding on the argument JSON. The difference is semantic: function calling is about action routing, and the schema applies to the arguments passed to the chosen function. Structured outputs via response_format applies to the model's final text response.
Why would I ever still use JSON mode in 2025 or 2026?
JSON mode remains useful on older model versions that predate structured outputs (for example gpt-4-turbo, gpt-3.5-turbo, or provider-hosted open-source models). It is also simpler when you only need any valid JSON and don't care about the exact shape — for example, as a quick guard against the model returning a plain string response. For new projects on current models, prefer structured outputs.
Do Anthropic and Google support structured outputs the same way OpenAI does?
The concept is the same — supply a JSON Schema, get schema-exact output via grammar-based decoding — but the API surface differs. OpenAI uses response_format: {type: "json_schema", json_schema: {...}, strict: true}. Anthropic (as of November 2025) uses output_config.format with a beta header. Google Gemini uses the response_schema field in generation_config. All three compile the schema to a grammar and mask invalid tokens at decode time.
What happens if my JSON Schema uses features that the provider doesn't support in strict mode?
Behavior varies by provider. OpenAI's strict mode requires additionalProperties: false on every object and all properties in required; submitting a schema that violates these rules returns a request-time validation error (not a generation-time error, which is a helpful fail-fast). Other unsupported keywords may be silently ignored, meaning the guarantee degrades to best-effort. Always consult the provider's structured-outputs documentation for the exact list of supported JSON Schema keywords.