Overview
Vigil is a security scanner for large language model prompts and responses. You give it the text going into or coming out of an LLM, and it runs that text through a set of modular scanners to flag prompt injections, jailbreaks, and other risky inputs. It ships as both a Python library you import into your own app and a REST API server you can run as a standalone service.
It is aimed at engineers building LLM-backed applications who want a detection layer in front of their model. Vigil bundles the detection signatures and datasets you need to self-host, and lets you add your own YARA rules and embedding datasets. The project describes itself as alpha and meant for research, so it is best treated as one layer in a defense-in-depth setup rather than a complete fix.
As a guardrail framework, Vigil sits between your application and the model. It combines several techniques in one pipeline: vector similarity against known attacks, YARA heuristics, a transformer classifier, prompt-response similarity, canary tokens, and sentiment analysis. Detections from each scanner are reported together so you can decide how to handle a flagged prompt.
What it does
- Modular, extensible scanners you can mix and match per pipeline
- YARA-based heuristic signatures, plus support for your own custom rules
- Vector database / text similarity scanning, with optional auto-updating on detected prompts
- Transformer model classifier and prompt-response similarity checks
- Canary tokens and sentiment analysis scanners
- Runs as a Python library or a REST API server, with a Streamlit web UI playground
Getting started
Vigil installs from source. Clone the repo, install YARA, set up a virtualenv, install the library, then use it from Python or run the API server.
Clone the repository
Grab the source and change into the directory. You also need YARA v4.3.2 installed separately (see the YARA getting-started docs).
git clone https://github.com/deadbits/vigil-llm.git
cd vigil-llmSet up a virtualenv and install Vigil
Create and activate a virtual environment, then install the library in editable mode.
python3 -m venv venv
source venv/bin/activate
pip install -e .Scan a prompt from Python
Import the Vigil class, load it from your config file, and run a prompt through the input scanners.
from vigil.vigil import Vigil
app = Vigil.from_config('conf/openai.conf')
result = app.input_scanner.perform_scan(
input_prompt="prompt goes here"
)Or run the REST API server
Start Vigil as an HTTP service using your server config.
python vigil-server.py --conf conf/server.confCommands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Screen user prompts for known prompt-injection and jailbreak patterns before they reach your LLM
- Add a guardrail microservice in front of an existing chatbot or RAG app via the REST API
- Maintain custom YARA signatures and embedding datasets for attacks specific to your product
- Use canary tokens and prompt-response similarity to catch data exfiltration or leaked system prompts
How Vigil compares
Vigil alongside other open-source guardrails & security tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Microsoft Presidio | ★ 9.3k | A framework for detecting, redacting, masking, and anonymizing personal data (PII) in text, images, and structured data using NER models, regex, and rule-based recognizers. |
| Guardrails AI | ★ 7k | A Python framework that wraps LLM calls with composable input/output validators (from the Guardrails Hub) to check structure, type, and safety risks before responses reach users. |
| NeMo Guardrails | ★ 6.5k | NVIDIA's toolkit for adding programmable rails to LLM chat apps, using the Colang language to control dialog flow and block jailbreaks, prompt injection, and off-topic answers. |
| GLiNER | ★ 3.3k | A small zero-shot named-entity recognition model that can extract arbitrary entity types from text and is widely used as a PII detection backend, including inside Presidio. |
| LLM Guard | ★ 3.1k | A security toolkit from Protect AI with 35+ input and output scanners that sanitize prompts and responses for prompt injection, toxicity, PII leakage, and harmful content. |
| Rebuff | ★ 1.5k | A prompt injection detector that combines heuristics, an LLM-based classifier, a vector store of past attacks, and canary tokens to catch attempts to subvert an LLM application. |
| Detoxify | ★ 1.3k | Pretrained transformer models from Unitary that score text for toxicity, insults, threats, and hate speech, often used to moderate LLM inputs and outputs. |
| Vigil | ★ 482 | Scan LLM prompts and responses to catch prompt injections and jailbreaks |