AI/TLDR

Strix

Open-source AI agents that hack your app to find and prove real vulnerabilities

Overview

Strix is an open-source security testing tool built around autonomous AI agents that behave like real hackers. Instead of only scanning code statically, the agents run your application dynamically, look for weaknesses, and confirm each finding with an actual proof-of-concept so you get fewer false positives.

It is aimed at developers and security teams who want fast, accurate testing without the cost of manual pentesting. Strix ships as a developer-first CLI that produces actionable reports, can run teams of agents that collaborate, and plugs into CI/CD so insecure code can be caught before it reaches production.

What it does

  • Full hacker toolkit out of the box: HTTP proxy, browser automation, terminal shells, and a Python runtime for custom exploits
  • Real validation with working proof-of-concepts instead of unverified false positives
  • Detects a wide range of issues: access control (IDOR, privilege escalation), injection (SQL, NoSQL, command), SSRF/XXE, XSS, business-logic and auth flaws
  • Graph of Agents: multiple specialized agents run in parallel and share discoveries for broad coverage
  • Works with any supported LLM provider (OpenAI, Anthropic, Google, and others), set via environment variables
  • Headless non-interactive mode and a GitHub Actions workflow for scanning pull requests in CI/CD

Getting started

Strix needs Docker running and an LLM API key from a supported provider (OpenAI, Anthropic, Google, etc.). You install it with a single script, point it at your AI provider, and run a scan against a local directory, a GitHub repo, or a live URL.

Install Strix

Install the CLI with the official install script.

bashbash
curl -sSL https://strix.ai/install | bash

Configure your AI provider

Set the model and API key as environment variables. Strix saves this configuration to ~/.strix/cli-config.json so you do not have to re-enter it each run.

bashbash
export STRIX_LLM="openai/gpt-5.4"
export LLM_API_KEY="your-api-key"

Run your first security assessment

Point Strix at a local app directory. The first run automatically pulls the sandbox Docker image, and results are saved to strix_runs/<run-name>.

bashbash
strix --target ./app-directory

Scan a repo or live app

You can also target a GitHub repository or a deployed application for a black-box assessment.

bashbash
strix --target https://github.com/org/repo
strix --target https://your-app.com

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Application security testing: detect and validate critical vulnerabilities in your applications
  • Rapid penetration testing: get pentests done in hours instead of weeks, with compliance reports
  • Bug bounty automation: automate research and generate proof-of-concepts for faster reporting
  • CI/CD integration: run tests on pull requests to block vulnerabilities before they reach production

How Strix compares

Strix alongside other open-source evaluation & red-teaming tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Strix★ 26.1kOpen-source AI agents that hack your app to find and prove real vulnerabilities
promptfoo★ 22.4kA developer-first CLI and library for testing and comparing prompts and models, with red-teaming probes for prompt injection, PII leaks, and other vulnerabilities.
OpenAI Evals★ 18.7kA framework and open registry for building and running evaluations of LLMs and LLM-based systems, including prompt chains and tool-using agents.
DeepEval★ 16.3kAn open-source Python framework that tests LLM apps like unit tests, with 50+ metrics for RAG, agents, chatbots, and safety, and a Pytest integration for CI/CD.
Ragas★ 14.4kAn evaluation toolkit focused on retrieval-augmented generation that scores answer faithfulness, context precision/recall, and relevancy, often without needing ground-truth labels.
Arize Phoenix★ 10.2kAn open-source observability and evaluation tool for tracing LLM and agent behavior, running evals on traces, and troubleshooting issues in development and production.
garak★ 8.2kAn LLM vulnerability scanner from NVIDIA with 100+ attack probes that test models for prompt injection, data leakage, jailbreaks, and other security weaknesses.
Giskard★ 5.4kAn open-source library for testing and scanning LLM and ML models for issues like hallucination, bias, and toxicity, including multi-turn agent testing and a vulnerability scanner.