In plain English
An LLM is a text-completion engine. It produces a string. Your application needs data — a structured object, a verified sentiment label, a safe HTML snippet, a valid SQL query. The gap between "some text" and "data my app can use" is exactly what output validation fills.
Output validation is the set of checks and transformations you run after the model responds and before your app acts on the response. It answers three questions: (1) Is this the right shape? (2) Is this safe to pass downstream? (3) If neither, what do I do instead?
The everyday analogy: imagine you ordered a pizza by phone and the restaurant read your order back to you. Validation is you checking, before you hang up, that they got the size, toppings, and delivery address right — and asking them to repeat it if they didn't. You're not redoing the whole order; you're just confirming the contract before committing.
Why it matters
Models fail in predictable ways. They return prose when you asked for JSON. They include a markdown code fence around the JSON you asked for, breaking JSON.parse. They truncate a list mid-item when a response is long. They put a field in the wrong type — a number as a string, a date as a Unix timestamp when you expected ISO 8601. They occasionally hallucinate fields that don't exist in your schema.
Each failure mode is harmless if you catch it before your app acts on it. It becomes a production incident if you don't. An unparseable JSON string passed to a downstream function raises an unhandled exception. A hallucinated SQL column becomes a database error. An unsanitized HTML string from a model, rendered directly in a browser, is an XSS vector — listed as a top-10 LLM vulnerability by OWASP.
The cost of skipping validation
- Runtime crashes — downstream code assumes a typed field and gets
undefinedor a malformed string. - Silent data corruption — the value parses but is semantically wrong (e.g., confidence score stored as a string instead of a float).
- Security vulnerabilities — model output passed to a shell, SQL query, or web page without sanitization creates injection attack surfaces.
- Bad user experience — the model apology message ("I'm sorry, I can't…") displayed raw instead of a graceful fallback.
How it works
A production output-validation pipeline has three stages that run in order after every model call: parse, validate, and act on failure. The diagram below shows the full flow.
Stage 1 — Parse: extracting structure from text
The first job is getting from raw text to a Python dict (or equivalent). Three common approaches, in order of reliability:
- Constrained decoding (most reliable, local models only): libraries like Outlines use a finite-state machine to mask invalid tokens during generation, so the model literally cannot produce output that violates the schema. Zero retries needed, but requires self-hosting.
- Native structured output (reliable for hosted APIs): OpenAI, Anthropic, and Gemini all support a
response_formator tool-call mode that constrains the model server-side. The API will not return until the output matches your schema. Use this when available. - Post-hoc JSON extraction (fallback): strip markdown fences with a regex, then call
JSON.parse/json.loads. This is the least reliable approach — it works most of the time but needs a retry strategy for the failures.
import json, re
def extract_json(raw: str) -> dict:
"""Strip markdown fences, then parse JSON."""
# Remove ```json ... ``` or ``` ... ``` wrappers
cleaned = re.sub(r"^```[\w]*\n?", "", raw.strip(), flags=re.MULTILINE)
cleaned = re.sub(r"```$", "", cleaned.strip())
return json.loads(cleaned)Stage 2 — Schema validation: enforcing types and constraints
Once you have a Python dict, you need to confirm it matches your contract — all required fields present, correct types, values within acceptable ranges. Pydantic is the standard tool for this in Python. Define a model that describes the expected output; Pydantic raises a ValidationError with field-level detail when anything is wrong.
from pydantic import BaseModel, Field, ValidationError
from typing import Literal
class SentimentResult(BaseModel):
label: Literal["positive", "negative", "neutral"]
confidence: float = Field(ge=0.0, le=1.0)
reasoning: str = Field(min_length=10)
def validate_output(raw_dict: dict) -> SentimentResult | None:
try:
return SentimentResult(**raw_dict)
except ValidationError as e:
# e.errors() gives field-level detail — useful in retry prompts
print(f"Validation failed: {e.errors()}")
return NoneStage 3 — Act on failure: retries, fallbacks, and escalation
When parsing or validation fails, you have three choices: retry with error feedback, use a fallback value, or escalate to a human. Most production systems use all three in sequence.
Retry loops and fallback strategies
The standard pattern for handling a validation failure is to re-prompt the model, this time including the error message, so the model can self-correct. Research on production chatbots shows about 8% of responses fail first-pass validation; including the specific Pydantic error in the retry prompt drops the second-pass failure rate to under 1%.
import json
from pydantic import ValidationError
MAX_RETRIES = 3
def call_with_validation(prompt: str, llm_fn, schema_cls) -> dict:
messages = [{"role": "user", "content": prompt}]
last_error = None
for attempt in range(MAX_RETRIES):
raw = llm_fn(messages) # returns the model's text
try:
data = json.loads(raw)
return schema_cls(**data) # raises ValidationError on bad schema
except (json.JSONDecodeError, ValidationError) as e:
last_error = str(e)
# Feed the error back so the model knows what to fix
messages.append({"role": "assistant", "content": raw})
messages.append({
"role": "user",
"content": f"Your response failed validation: {last_error}. "
f"Please respond again with valid JSON matching the schema."
})
raise RuntimeError(f"Output validation failed after {MAX_RETRIES} attempts: {last_error}")Libraries like Instructor and Pydantic AI automate exactly this pattern. Instructor wraps any OpenAI-compatible API — you pass a Pydantic model as response_model and set max_retries; the library handles prompt construction, parsing, and the retry loop. Instructor has over 3 million monthly downloads and is the most widely deployed tool for this pattern.
import instructor
from openai import OpenAI
from pydantic import BaseModel
client = instructor.from_openai(OpenAI())
class SentimentResult(BaseModel):
label: str
confidence: float
result = client.chat.completions.create(
model="gpt-4o-mini",
response_model=SentimentResult,
max_retries=3, # retries with error feedback automatically
messages=[{"role": "user", "content": "Analyse the sentiment of: 'Great product!'"}],
)
print(result.label, result.confidence)Fallback hierarchy
Retries add latency — each retry costs 1.5–3 seconds on a typical hosted API call. Decide upfront on a fallback hierarchy: what the app does when retries are exhausted.
- Safe default: return a known-good static value or a neutral sentinel (e.g.,
{"label": "unknown", "confidence": 0.0}) rather than crashing. - Graceful degradation: show the raw model text to the user with a disclaimer instead of the structured data, so the interaction still completes.
- Human-in-the-loop: queue the failed output for manual review if the stakes are high (financial decisions, medical advice, legal content).
- Circuit breaker: if more than X% of calls to a model are failing validation in a rolling window, switch to a backup model or disable the feature until investigated.
Content filters and sanitization
Schema validation confirms the output is the right shape. Content filtering confirms the output is safe to use. The two are complementary: a response can be perfectly valid JSON and still contain a customer's email address, a racial slur, or a competitor recommendation that violates your policy.
Common content checks
| Check | What it catches | Typical tool |
|---|---|---|
| PII detection | Names, emails, SSNs, phone numbers in model output | Regex + NER models (spaCy, AWS Comprehend, Azure PII) |
| Toxicity filter | Hate speech, explicit content, threats | OpenAI Moderation API, Perspective API, Llama Guard |
| HTML/script sanitization | XSS payloads when output is rendered in browser | bleach (Python), DOMPurify (JS) |
| SQL injection scan | Dangerous SQL when model generates queries | Parameterized queries; never concatenate model text into SQL |
| Hallucination check | Claims that contradict grounding documents | LLM-as-a-judge or retrieval-based consistency check |
Sanitizing for downstream context
The right sanitization strategy depends on where the output goes, not just what it contains. Four common downstream contexts each have different rules:
- Rendered HTML: strip all tags except a known-safe allowlist using
bleach.clean()orDOMPurify.sanitize(). Never trustinnerHTMLwith raw model output. - Database writes: use parameterized queries or an ORM — never concatenate model text into a SQL string, even if the model was asked for a query.
- Shell/subprocess calls: avoid passing model output to a shell entirely. If unavoidable, use an allowlist of permitted commands and reject everything else.
- Downstream API calls: validate that model-generated API parameters (URLs, identifiers, amounts) are in the expected format and within permitted ranges before sending.
import bleach
ALLOWED_TAGS = ["b", "i", "em", "strong", "p", "ul", "ol", "li"]
def sanitize_for_html(model_output: str) -> str:
"""Strip any tags not in the allowlist before rendering in a browser."""
return bleach.clean(model_output, tags=ALLOWED_TAGS, strip=True)Going deeper
Constrained decoding vs post-hoc validation
For teams running self-hosted models, constrained decoding (Outlines, llama.cpp grammar mode) is architecturally superior to retry loops: the model is structurally incapable of producing invalid output, so there is nothing to retry. The finite-state machine (FSM) approach masks tokens at sampling time — if the current partial JSON requires the next token to be a digit, all non-digit tokens get probability zero. The output is always schema-valid, at the cost of a small speed penalty (~5–10%) from the FSM overhead.
For hosted APIs, use native structured output modes (OpenAI's response_format: {type: "json_schema"}, Anthropic's tool-use with a typed input schema) before falling back to post-hoc parsing. Native modes are server-side constrained decoding — you get the reliability benefit without self-hosting.
Validation as an LLM-as-a-judge step
Schema validation catches structural errors, but not semantic ones. For high-stakes outputs (medical summaries, legal drafts, financial recommendations), a second model call that checks the first response against a rubric — sometimes called LLM-as-a-judge or self-consistency checking — can catch confident hallucinations that pass schema validation cleanly. See LLM as a Judge Explained for the mechanics.
Observability: log every validation failure
Every parse error, schema violation, and content-filter hit is a signal about your prompt quality. Instrument your validation layer to emit structured logs with the model used, the prompt version, the failure type, and the raw response. Review these weekly — a spike in JSON parse errors after a prompt change is far cheaper to catch in logs than in production incidents.
import logging, json
from datetime import datetime
logger = logging.getLogger("llm.validation")
def log_validation_failure(
prompt_version: str,
model: str,
failure_type: str, # "parse_error" | "schema_error" | "content_filter"
raw_output: str,
error_detail: str,
):
logger.warning(json.dumps({
"ts": datetime.utcnow().isoformat(),
"prompt_version": prompt_version,
"model": model,
"failure_type": failure_type,
"error": error_detail,
# Truncate raw output to avoid flooding logs with huge responses
"raw_output_preview": raw_output[:500],
}))Choosing the right library
The ecosystem has converged on a few well-maintained options:
- Instructor — best for hosted APIs (OpenAI, Anthropic, Gemini, and any OpenAI-compatible endpoint). Minimal code change, automatic retry with error feedback, TypeScript port available.
- Pydantic AI — agent-oriented; the output type is declared as part of the agent, with retry and tool-calling baked in. Better choice when you're building a multi-step agent rather than a single call.
- Outlines — best for local/self-hosted models where you need zero-retry guaranteed schema compliance. Requires ownership of the inference stack.
- Guardrails AI — broader scope than just output validation; combines schema validation with a content-policy pipeline. Higher setup cost, more useful when you need both structural and semantic checks from one library.
FAQ
Why does my LLM sometimes return prose instead of JSON even when I ask for JSON?
Models are trained to be helpful and conversational. When they're uncertain or the prompt is ambiguous, they default to natural language. Use a native structured output mode (e.g., response_format with a JSON schema) to enforce structure at the API level, or use a constrained-decoding library for local models. Always include a concrete JSON example in your prompt as a fallback — it shifts the model's probability distribution toward compliant output.
How many retries should I allow before giving up?
Two to three retries is the practical sweet spot in production. One retry with error feedback catches the vast majority of transient failures. Beyond three retries, failure is usually a prompt-design problem — the model doesn't understand what you need — not a stochastic blip. More retries also add user-facing latency: each retry on a hosted API typically adds 1.5–3 seconds.
What is the difference between output validation and output sanitization?
Validation checks that the output matches the expected structure and values — right shape, right types, required fields present. Sanitization transforms the output to remove or neutralize unsafe content before it reaches a downstream system — stripping HTML tags, escaping SQL special characters, redacting PII. You normally need both: validate first, then sanitize the validated output before passing it on.
Does using structured output mode (JSON mode) mean I don't need Pydantic validation?
No. JSON mode (or response_format: json_schema) guarantees the output is valid JSON that conforms to the schema — it prevents missing fields and wrong types. It does not check business-rule constraints (e.g., confidence must be between 0 and 1, date must be in the future), and it does not filter harmful content. Pydantic adds those deeper checks on top of the structural guarantee.
How do I catch PII leaking in LLM responses?
Combine a regex pass (for high-recall detection of patterns like emails and phone numbers) with an NER model (for names, addresses) that runs as a post-processing step. Cloud options include AWS Comprehend Detect PII Entities and Azure Cognitive Services PII detection. If a PII hit is found, either redact the sensitive span with a placeholder or discard the response and return a fallback.
Should I validate inputs or outputs, or both?
Both — for different reasons. Input validation (prompts, user messages) prevents injection attacks and runaway costs from unexpectedly long inputs. Output validation ensures the model's response is safe and correctly structured before your app acts on it. The two layers are complementary; skipping either leaves a gap.