Vigil

Scan LLM prompts and responses to catch prompt injections and jailbreaks

github.com/deadbits/vigil-llm★ 482 vigil.deadbits.ai

Overview

Vigil is a security scanner for large language model prompts and responses. You give it the text going into or coming out of an LLM, and it runs that text through a set of modular scanners to flag prompt injections, jailbreaks, and other risky inputs. It ships as both a Python library you import into your own app and a REST API server you can run as a standalone service.

It is aimed at engineers building LLM-backed applications who want a detection layer in front of their model. Vigil bundles the detection signatures and datasets you need to self-host, and lets you add your own YARA rules and embedding datasets. The project describes itself as alpha and meant for research, so it is best treated as one layer in a defense-in-depth setup rather than a complete fix.

As a guardrail framework, Vigil sits between your application and the model. It combines several techniques in one pipeline: vector similarity against known attacks, YARA heuristics, a transformer classifier, prompt-response similarity, canary tokens, and sentiment analysis. Detections from each scanner are reported together so you can decide how to handle a flagged prompt.

What it does

Modular, extensible scanners you can mix and match per pipeline
YARA-based heuristic signatures, plus support for your own custom rules
Vector database / text similarity scanning, with optional auto-updating on detected prompts
Transformer model classifier and prompt-response similarity checks
Canary tokens and sentiment analysis scanners
Runs as a Python library or a REST API server, with a Streamlit web UI playground

Getting started

Vigil installs from source. Clone the repo, install YARA, set up a virtualenv, install the library, then use it from Python or run the API server.

Clone the repository

Grab the source and change into the directory. You also need YARA v4.3.2 installed separately (see the YARA getting-started docs).

bashbash

git clone https://github.com/deadbits/vigil-llm.git
cd vigil-llm

Set up a virtualenv and install Vigil

Create and activate a virtual environment, then install the library in editable mode.

bashbash

python3 -m venv venv
source venv/bin/activate
pip install -e .

Scan a prompt from Python

Import the Vigil class, load it from your config file, and run a prompt through the input scanners.

pythonpython

from vigil.vigil import Vigil

app = Vigil.from_config('conf/openai.conf')

result = app.input_scanner.perform_scan(
    input_prompt="prompt goes here"
)

Or run the REST API server

Start Vigil as an HTTP service using your server config.

bashbash

python vigil-server.py --conf conf/server.conf

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Screen user prompts for known prompt-injection and jailbreak patterns before they reach your LLM
Add a guardrail microservice in front of an existing chatbot or RAG app via the REST API
Maintain custom YARA signatures and embedding datasets for attacks specific to your product
Use canary tokens and prompt-response similarity to catch data exfiltration or leaked system prompts

How Vigil compares

Vigil alongside other open-source guardrails & security tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Microsoft Presidio	★ 9.3k	A framework for detecting, redacting, masking, and anonymizing personal data (PII) in text, images, and structured data using NER models, regex, and rule-based recognizers.
Guardrails AI	★ 7k	A Python framework that wraps LLM calls with composable input/output validators (from the Guardrails Hub) to check structure, type, and safety risks before responses reach users.
NeMo Guardrails	★ 6.5k	NVIDIA's toolkit for adding programmable rails to LLM chat apps, using the Colang language to control dialog flow and block jailbreaks, prompt injection, and off-topic answers.
GLiNER	★ 3.3k	A small zero-shot named-entity recognition model that can extract arbitrary entity types from text and is widely used as a PII detection backend, including inside Presidio.
LLM Guard	★ 3.1k	A security toolkit from Protect AI with 35+ input and output scanners that sanitize prompts and responses for prompt injection, toxicity, PII leakage, and harmful content.
Rebuff	★ 1.5k	A prompt injection detector that combines heuristics, an LLM-based classifier, a vector store of past attacks, and canary tokens to catch attempts to subvert an LLM application.
Detoxify	★ 1.3k	Pretrained transformer models from Unitary that score text for toxicity, insults, threats, and hate speech, often used to moderate LLM inputs and outputs.
Vigil	★ 482	Scan LLM prompts and responses to catch prompt injections and jailbreaks

// Overview

// What it does

// Getting started

Clone the repository

Set up a virtualenv and install Vigil

Scan a prompt from Python

Or run the REST API server

// When to use it

// How Vigil compares

Overview

What it does

Getting started

When to use it

How Vigil compares