AI/TLDR

Invariant Guardrails

Rule-based guardrails that inspect agent tool calls and messages to block unsafe behavior

Overview

Invariant Guardrails is a rule-based layer for securing LLM and MCP-powered agents. It sits between your application and your MCP servers or LLM provider as a proxy, checking and intercepting tool calls and messages as they pass through. You add protections by writing rules rather than changing your agent's code.

Rules are written in a Python-inspired matching language. A rule describes a pattern - for example, a flow where an agent reads the user's inbox and then tries to email an unknown address - and raises an error when a trace matches it. The project ships a standard library of operations and detectors (such as prompt-injection detection) you can call inside rules.

It fits the guardrail-framework category for teams that want policy enforcement they can read and audit. You can run it transparently as an MCP or LLM proxy via the Invariant Gateway, or call it directly in Python with the invariant-ai package to evaluate rules against an agent trace locally.

What it does

  • Python-inspired matching rules that raise a named error when an agent trace matches an unsafe pattern
  • Inspects both LLM messages (user and assistant) and tool calls, and can match flows between calls (for example, get_inbox followed by send_email)
  • Deploys as a transparent MCP or LLM proxy through the Invariant Gateway, with no invasive code changes to your agent
  • Programmatic API via the invariant-ai package (LocalPolicy) to load and evaluate rules against a trace entirely on your machine
  • Standard library of operations and built-in detectors, including prompt_injection, for use inside rules
  • Open-source project by Invariant Labs with reference documentation for the rule-writing language

Getting started

You can run Guardrails programmatically with the invariant-ai package to load rules and analyze an agent trace locally. The example below mirrors the project README.

Install the package

Install the invariant-ai package, which exposes the analyzer used to load and evaluate rules.

bashbash
pip install invariant-ai

Write a rule and analyze a trace

Load a policy from a rule string, then call analyze() on a list of messages. The rule below raises an error if send_email is called after a get_website output that looks like a prompt injection.

pythonpython
from invariant.analyzer import LocalPolicy

policy = LocalPolicy.from_string("""
from invariant.detectors import prompt_injection

raise "Don't use send_email after get_website" if:
    (output: ToolOutput) -> (call2: ToolCall)
    output is tool:get_website
    prompt_injection(output.content, threshold=0.7)
    call2 is tool:send_email
""")

result = policy.analyze(messages)
print(result)

Run it as a proxy via Gateway

To enforce rules on live traffic without code changes, integrate Guardrails through the Invariant Gateway, which evaluates your rules before and after each LLM and MCP request. See the project documentation for setup.

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Block prompt-injection attacks where tool output (such as a fetched web page) tries to steer an agent into sending an email or taking another sensitive action
  • Stop data-exfiltration flows, for example reading the user's inbox and then emailing contents to an address outside your domain
  • Enforce content policies by scanning all LLM messages for banned phrases and rejecting requests that violate them
  • Monitor and steer MCP-powered agents in production by running rules as a transparent proxy instead of editing agent code

How Invariant Guardrails compares

Invariant Guardrails alongside other open-source guardrails & security tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Microsoft Presidio★ 9.3kA framework for detecting, redacting, masking, and anonymizing personal data (PII) in text, images, and structured data using NER models, regex, and rule-based recognizers.
Guardrails AI★ 7kA Python framework that wraps LLM calls with composable input/output validators (from the Guardrails Hub) to check structure, type, and safety risks before responses reach users.
NeMo Guardrails★ 6.5kNVIDIA's toolkit for adding programmable rails to LLM chat apps, using the Colang language to control dialog flow and block jailbreaks, prompt injection, and off-topic answers.
GLiNER★ 3.3kA small zero-shot named-entity recognition model that can extract arbitrary entity types from text and is widely used as a PII detection backend, including inside Presidio.
LLM Guard★ 3.1kA security toolkit from Protect AI with 35+ input and output scanners that sanitize prompts and responses for prompt injection, toxicity, PII leakage, and harmful content.
Rebuff★ 1.5kA prompt injection detector that combines heuristics, an LLM-based classifier, a vector store of past attacks, and canary tokens to catch attempts to subvert an LLM application.
Detoxify★ 1.3kPretrained transformer models from Unitary that score text for toxicity, insults, threats, and hate speech, often used to moderate LLM inputs and outputs.
Invariant Guardrails★ 429Rule-based guardrails that inspect agent tool calls and messages to block unsafe behavior