Overview
OpenAI Guardrails (Python) is a package that adds configurable safety and compliance checks to LLM applications. It works as a drop-in wrapper around OpenAI's Python client, so you can validate and moderate model inputs and outputs without rewriting your existing calls.
It is aimed at developers building chatbots, agents, and other LLM features who need to catch problems like unsafe content, PII, jailbreak attempts, or off-topic responses. You point it at a config file that defines which checks to run, and a failed check raises a tripwire exception you can handle.
As a guardrail framework, it sits between your application and the model. It also integrates with the OpenAI Agents SDK and ships an evaluation framework so you can measure how your guardrails perform on labeled datasets.
What it does
- Drop-in GuardrailsOpenAI client that replaces the standard OpenAI client and works with both Chat Completions and the Responses API
- Built-in checks including Moderation, Contains PII, URL Filter, Jailbreak, NSFW Text, Off Topic Prompts, and Hallucination Detection
- Tripwire model: a triggered guardrail raises GuardrailTripwireTriggered so you can catch and handle violations
- GuardrailAgent integration for the OpenAI Agents SDK
- Evaluation framework to benchmark guardrails on JSONL datasets, including model comparison and ROC curves
- Configuration driven by a JSON config file generated via the guided setup at guardrails.openai.com
Getting started
Install the package, create a guardrail config, then swap in the GuardrailsOpenAI client. Most users generate their config with the guided setup at guardrails.openai.com.
Install the package
Install openai-guardrails from PyPI.
pip install openai-guardrailsUse the drop-in client
Replace the standard OpenAI client with GuardrailsOpenAI and pass your config file. A triggered guardrail raises GuardrailTripwireTriggered.
from pathlib import Path
from guardrails import GuardrailsOpenAI, GuardrailTripwireTriggered
# Use GuardrailsOpenAI instead of OpenAI
client = GuardrailsOpenAI(config=Path("guardrail_config.json"))
try:
chat = client.chat.completions.create(
model="gpt-5",
messages=[{"role": "user", "content": "Hello world"}],
)
print(chat.choices[0].message.content)
except GuardrailTripwireTriggered as e:
print(f"Guardrail triggered: {e}")Integrate with the Agents SDK (optional)
Use GuardrailAgent to attach guardrails to an OpenAI Agents SDK agent.
from pathlib import Path
from guardrails import GuardrailAgent
agent = GuardrailAgent(
config=Path("guardrails_config.json"),
name="Customer support agent",
instructions="You are a customer support agent. You help customers with their questions.",
)Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Add moderation and PII detection to a customer-facing chatbot without rewriting existing OpenAI calls
- Block jailbreak attempts and keep an agent's responses within its intended business scope
- Validate model outputs for NSFW or off-topic content before they reach end users
- Benchmark and compare guardrail configurations on labeled datasets before shipping
How OpenAI Guardrails compares
OpenAI Guardrails alongside other open-source guardrails & security tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Microsoft Presidio | ★ 9.3k | A framework for detecting, redacting, masking, and anonymizing personal data (PII) in text, images, and structured data using NER models, regex, and rule-based recognizers. |
| Guardrails AI | ★ 7k | A Python framework that wraps LLM calls with composable input/output validators (from the Guardrails Hub) to check structure, type, and safety risks before responses reach users. |
| NeMo Guardrails | ★ 6.5k | NVIDIA's toolkit for adding programmable rails to LLM chat apps, using the Colang language to control dialog flow and block jailbreaks, prompt injection, and off-topic answers. |
| GLiNER | ★ 3.3k | A small zero-shot named-entity recognition model that can extract arbitrary entity types from text and is widely used as a PII detection backend, including inside Presidio. |
| LLM Guard | ★ 3.1k | A security toolkit from Protect AI with 35+ input and output scanners that sanitize prompts and responses for prompt injection, toxicity, PII leakage, and harmful content. |
| Rebuff | ★ 1.5k | A prompt injection detector that combines heuristics, an LLM-based classifier, a vector store of past attacks, and canary tokens to catch attempts to subvert an LLM application. |
| Detoxify | ★ 1.3k | Pretrained transformer models from Unitary that score text for toxicity, insults, threats, and hate speech, often used to moderate LLM inputs and outputs. |
| OpenAI Guardrails | ★ 215 | Drop-in safety and moderation checks for your OpenAI client |