AI/TLDR

LLM Guard

A security toolkit that scans LLM prompts and responses for unsafe content

Overview

LLM Guard is an open-source security toolkit from Protect AI that sits between your application and a large language model. It runs a set of scanners over user prompts before they reach the model and over the model's responses before they reach the user, flagging or cleaning content that breaks your rules.

It is meant for developers building production LLM apps who need a guardrail layer they control. You pick which scanners to run, and LLM Guard returns the sanitized text along with per-scanner validity flags and risk scores so your code can decide whether to block, allow, or log the request.

As a guardrail framework, it covers both directions of the conversation: input scanners handle things like prompt injection, toxic language, token limits, and PII anonymization, while output scanners check for data leakage, refusals, relevance, malicious URLs, and more.

What it does

  • More than 35 input and output scanners covering prompt injection, toxicity, PII, secrets, bias, and harmful content
  • Input scanners include Anonymize, PromptInjection, Toxicity, TokenLimit, BanTopics, Secrets, and Language
  • Output scanners include Deanonymize, NoRefusal, Relevance, Sensitive, MaliciousURLs, and FactualConsistency
  • A Vault object pairs Anonymize with Deanonymize so PII can be masked on the way in and restored on the way out
  • Each scan returns sanitized text plus per-scanner validity flags and risk scores for your own block or allow logic
  • Can be embedded in Python code or deployed as a separate API service

Getting started

Install the package with pip, then run your prompts through input scanners and your model's responses through output scanners. LLM Guard requires Python 3.9 or higher.

Install LLM Guard

Install the package from PyPI. Base functionality needs only a few libraries; more advanced scanners pull in extra dependencies automatically when used.

bashbash
pip install llm-guard

Scan a prompt

Build a list of input scanners and pass it to scan_prompt. You get back the sanitized prompt, a validity flag per scanner, and a risk score per scanner.

pythonpython
from llm_guard import scan_prompt
from llm_guard.input_scanners import Anonymize, PromptInjection, TokenLimit, Toxicity
from llm_guard.vault import Vault

vault = Vault()
input_scanners = [Anonymize(vault), Toxicity(), TokenLimit(), PromptInjection()]

sanitized_prompt, results_valid, results_score = scan_prompt(input_scanners, prompt)
if any(not result for result in results_valid.values()):
    print(f"Prompt {prompt} is not valid, scores: {results_score}")
    exit(1)

print(f"Prompt: {sanitized_prompt}")

Scan a response

After the model replies, pass the output scanners, the sanitized prompt, and the response text to scan_output. The Vault lets Deanonymize restore any PII that Anonymize masked earlier.

pythonpython
from llm_guard import scan_output
from llm_guard.output_scanners import Deanonymize, NoRefusal, Relevance, Sensitive

output_scanners = [Deanonymize(vault), NoRefusal(), Relevance(), Sensitive()]

sanitized_response_text, results_valid, results_score = scan_output(
    output_scanners, sanitized_prompt, response_text
)
if any(not result for result in results_valid.values()):
    print(f"Output {response_text} is not valid, scores: {results_score}")
    exit(1)

print(f"Output: {sanitized_response_text}")

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Block prompt injection attempts before they reach your model in a customer-facing chatbot
  • Mask PII in user input and restore it in the response so private data never reaches the LLM provider
  • Filter toxic, biased, or off-topic model output before it is shown to end users
  • Enforce token limits, banned topics, and secret detection as a shared guardrail layer across multiple LLM apps

How LLM Guard compares

LLM Guard alongside other open-source guardrails & security tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Microsoft Presidio★ 9.3kA framework for detecting, redacting, masking, and anonymizing personal data (PII) in text, images, and structured data using NER models, regex, and rule-based recognizers.
Guardrails AI★ 7kA Python framework that wraps LLM calls with composable input/output validators (from the Guardrails Hub) to check structure, type, and safety risks before responses reach users.
NeMo Guardrails★ 6.5kNVIDIA's toolkit for adding programmable rails to LLM chat apps, using the Colang language to control dialog flow and block jailbreaks, prompt injection, and off-topic answers.
GLiNER★ 3.3kA small zero-shot named-entity recognition model that can extract arbitrary entity types from text and is widely used as a PII detection backend, including inside Presidio.
LLM Guard★ 3.1kA security toolkit that scans LLM prompts and responses for unsafe content
Rebuff★ 1.5kA prompt injection detector that combines heuristics, an LLM-based classifier, a vector store of past attacks, and canary tokens to catch attempts to subvert an LLM application.
Detoxify★ 1.3kPretrained transformer models from Unitary that score text for toxicity, insults, threats, and hate speech, often used to moderate LLM inputs and outputs.
Vigil★ 482A Python library and REST API that scans LLM prompts and responses with YARA rules, transformer classifiers, and vector similarity to flag prompt injections and jailbreaks.