AI/TLDR

Vigil

Scan LLM prompts and responses to catch prompt injections and jailbreaks

Overview

Vigil is a security scanner for large language model prompts and responses. You give it the text going into or coming out of an LLM, and it runs that text through a set of modular scanners to flag prompt injections, jailbreaks, and other risky inputs. It ships as both a Python library you import into your own app and a REST API server you can run as a standalone service.

It is aimed at engineers building LLM-backed applications who want a detection layer in front of their model. Vigil bundles the detection signatures and datasets you need to self-host, and lets you add your own YARA rules and embedding datasets. The project describes itself as alpha and meant for research, so it is best treated as one layer in a defense-in-depth setup rather than a complete fix.

As a guardrail framework, Vigil sits between your application and the model. It combines several techniques in one pipeline: vector similarity against known attacks, YARA heuristics, a transformer classifier, prompt-response similarity, canary tokens, and sentiment analysis. Detections from each scanner are reported together so you can decide how to handle a flagged prompt.

What it does

  • Modular, extensible scanners you can mix and match per pipeline
  • YARA-based heuristic signatures, plus support for your own custom rules
  • Vector database / text similarity scanning, with optional auto-updating on detected prompts
  • Transformer model classifier and prompt-response similarity checks
  • Canary tokens and sentiment analysis scanners
  • Runs as a Python library or a REST API server, with a Streamlit web UI playground

Getting started

Vigil installs from source. Clone the repo, install YARA, set up a virtualenv, install the library, then use it from Python or run the API server.

Clone the repository

Grab the source and change into the directory. You also need YARA v4.3.2 installed separately (see the YARA getting-started docs).

bashbash
git clone https://github.com/deadbits/vigil-llm.git
cd vigil-llm

Set up a virtualenv and install Vigil

Create and activate a virtual environment, then install the library in editable mode.

bashbash
python3 -m venv venv
source venv/bin/activate
pip install -e .

Scan a prompt from Python

Import the Vigil class, load it from your config file, and run a prompt through the input scanners.

pythonpython
from vigil.vigil import Vigil

app = Vigil.from_config('conf/openai.conf')

result = app.input_scanner.perform_scan(
    input_prompt="prompt goes here"
)

Or run the REST API server

Start Vigil as an HTTP service using your server config.

bashbash
python vigil-server.py --conf conf/server.conf

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Screen user prompts for known prompt-injection and jailbreak patterns before they reach your LLM
  • Add a guardrail microservice in front of an existing chatbot or RAG app via the REST API
  • Maintain custom YARA signatures and embedding datasets for attacks specific to your product
  • Use canary tokens and prompt-response similarity to catch data exfiltration or leaked system prompts

How Vigil compares

Vigil alongside other open-source guardrails & security tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Microsoft Presidio★ 9.3kA framework for detecting, redacting, masking, and anonymizing personal data (PII) in text, images, and structured data using NER models, regex, and rule-based recognizers.
Guardrails AI★ 7kA Python framework that wraps LLM calls with composable input/output validators (from the Guardrails Hub) to check structure, type, and safety risks before responses reach users.
NeMo Guardrails★ 6.5kNVIDIA's toolkit for adding programmable rails to LLM chat apps, using the Colang language to control dialog flow and block jailbreaks, prompt injection, and off-topic answers.
GLiNER★ 3.3kA small zero-shot named-entity recognition model that can extract arbitrary entity types from text and is widely used as a PII detection backend, including inside Presidio.
LLM Guard★ 3.1kA security toolkit from Protect AI with 35+ input and output scanners that sanitize prompts and responses for prompt injection, toxicity, PII leakage, and harmful content.
Rebuff★ 1.5kA prompt injection detector that combines heuristics, an LLM-based classifier, a vector store of past attacks, and canary tokens to catch attempts to subvert an LLM application.
Detoxify★ 1.3kPretrained transformer models from Unitary that score text for toxicity, insults, threats, and hate speech, often used to moderate LLM inputs and outputs.
Vigil★ 482Scan LLM prompts and responses to catch prompt injections and jailbreaks