OpenAI Guardrails

Drop-in safety and moderation checks for your OpenAI client

github.com/openai/openai-guardrails-python★ 215 guardrails.openai.com

Overview

OpenAI Guardrails (Python) is a package that adds configurable safety and compliance checks to LLM applications. It works as a drop-in wrapper around OpenAI's Python client, so you can validate and moderate model inputs and outputs without rewriting your existing calls.

It is aimed at developers building chatbots, agents, and other LLM features who need to catch problems like unsafe content, PII, jailbreak attempts, or off-topic responses. You point it at a config file that defines which checks to run, and a failed check raises a tripwire exception you can handle.

As a guardrail framework, it sits between your application and the model. It also integrates with the OpenAI Agents SDK and ships an evaluation framework so you can measure how your guardrails perform on labeled datasets.

What it does

Drop-in GuardrailsOpenAI client that replaces the standard OpenAI client and works with both Chat Completions and the Responses API
Built-in checks including Moderation, Contains PII, URL Filter, Jailbreak, NSFW Text, Off Topic Prompts, and Hallucination Detection
Tripwire model: a triggered guardrail raises GuardrailTripwireTriggered so you can catch and handle violations
GuardrailAgent integration for the OpenAI Agents SDK
Evaluation framework to benchmark guardrails on JSONL datasets, including model comparison and ROC curves
Configuration driven by a JSON config file generated via the guided setup at guardrails.openai.com

Getting started

Install the package, create a guardrail config, then swap in the GuardrailsOpenAI client. Most users generate their config with the guided setup at guardrails.openai.com.

Install the package

Install openai-guardrails from PyPI.

bashbash

pip install openai-guardrails

Use the drop-in client

Replace the standard OpenAI client with GuardrailsOpenAI and pass your config file. A triggered guardrail raises GuardrailTripwireTriggered.

pythonpython

from pathlib import Path
from guardrails import GuardrailsOpenAI, GuardrailTripwireTriggered

# Use GuardrailsOpenAI instead of OpenAI
client = GuardrailsOpenAI(config=Path("guardrail_config.json"))

try:
    chat = client.chat.completions.create(
        model="gpt-5",
        messages=[{"role": "user", "content": "Hello world"}],
    )
    print(chat.choices[0].message.content)
except GuardrailTripwireTriggered as e:
    print(f"Guardrail triggered: {e}")

Integrate with the Agents SDK (optional)

Use GuardrailAgent to attach guardrails to an OpenAI Agents SDK agent.

pythonpython

from pathlib import Path
from guardrails import GuardrailAgent

agent = GuardrailAgent(
    config=Path("guardrails_config.json"),
    name="Customer support agent",
    instructions="You are a customer support agent. You help customers with their questions.",
)

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Add moderation and PII detection to a customer-facing chatbot without rewriting existing OpenAI calls
Block jailbreak attempts and keep an agent's responses within its intended business scope
Validate model outputs for NSFW or off-topic content before they reach end users
Benchmark and compare guardrail configurations on labeled datasets before shipping

How OpenAI Guardrails compares

OpenAI Guardrails alongside other open-source guardrails & security tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Microsoft Presidio	★ 9.3k	A framework for detecting, redacting, masking, and anonymizing personal data (PII) in text, images, and structured data using NER models, regex, and rule-based recognizers.
Guardrails AI	★ 7k	A Python framework that wraps LLM calls with composable input/output validators (from the Guardrails Hub) to check structure, type, and safety risks before responses reach users.
NeMo Guardrails	★ 6.5k	NVIDIA's toolkit for adding programmable rails to LLM chat apps, using the Colang language to control dialog flow and block jailbreaks, prompt injection, and off-topic answers.
GLiNER	★ 3.3k	A small zero-shot named-entity recognition model that can extract arbitrary entity types from text and is widely used as a PII detection backend, including inside Presidio.
LLM Guard	★ 3.1k	A security toolkit from Protect AI with 35+ input and output scanners that sanitize prompts and responses for prompt injection, toxicity, PII leakage, and harmful content.
Rebuff	★ 1.5k	A prompt injection detector that combines heuristics, an LLM-based classifier, a vector store of past attacks, and canary tokens to catch attempts to subvert an LLM application.
Detoxify	★ 1.3k	Pretrained transformer models from Unitary that score text for toxicity, insults, threats, and hate speech, often used to moderate LLM inputs and outputs.
OpenAI Guardrails	★ 215	Drop-in safety and moderation checks for your OpenAI client

// Overview

// What it does

// Getting started