Overview
Katana is a command-line crawling and spidering framework written in Go, built by ProjectDiscovery. You point it at a URL or a list of URLs, and it walks the site to discover every reachable URL, endpoint, and JavaScript file.
It is aimed at security testers, bug bounty hunters, and engineers who need to map a site's attack surface or build automation pipelines. Katana runs in a standard HTTP mode or a headless (real browser) mode, and can parse JavaScript files to find endpoints that a plain crawler would miss.
Within the web scraping and crawling category, Katana focuses on speed and scriptability. It reads input from STDIN, a single URL, or a file list, and writes output to STDOUT, a file, or JSON, so it fits cleanly into shell pipelines alongside other tools.
What it does
- Standard and headless (real browser) crawling modes
- JavaScript parsing and crawling to surface hidden endpoints
- Scope control via preconfigured fields or custom regex
- Customizable automatic form filling (experimental)
- Flexible input (STDIN, URL, list) and output (STDOUT, file, JSON)
- Configurable crawl depth, duration, and per-domain page limits
Getting started
Katana requires Go 1.26+ to install from source, or you can pull a prebuilt Docker image. Once installed, you can crawl a target with a single command.
Install with Go
Install the latest Katana binary using the Go toolchain. CGO must be enabled.
CGO_ENABLED=1 go install github.com/projectdiscovery/katana/cmd/katana@latestOr pull the Docker image
If you prefer not to build from source, pull the official image.
docker pull projectdiscovery/katana:latestCrawl a target
Pass a URL with -u to start a standard crawl. Add -headless for browser-based crawling.
katana -u https://tesla.comCheck the available flags
Run the help command to see all supported switches, including depth, JS crawling, and scope control.
katana -hCommands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Mapping a web application's URLs and endpoints during a security assessment or bug bounty
- Extracting endpoints from JavaScript files that a basic crawler would miss
- Feeding discovered URLs into automation pipelines via STDIN/STDOUT and JSON output
- Crawling single-page apps and JS-heavy sites in headless mode with a real browser
How Katana compares
Katana alongside other open-source web scraping & crawling tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Firecrawl | ★ 135k | A crawling service and API that converts whole websites into clean Markdown or structured JSON ready for LLMs. |
| Crawl4AI | ★ 68.9k | A local-first Python web crawler that turns pages into clean Markdown for use in RAG and LLM pipelines. |
| Scrapling | ★ 65k | A Python web scraping framework whose parser relocates your elements when pages change, with stealthy fetchers and a Scrapy-like spider engine for full crawls. |
| Scrapy | ★ 62.3k | A mature Python framework for writing fast spiders that crawl websites and extract structured data at scale. |
| ScrapeGraphAI | ★ 27.4k | A Python library that uses LLMs and a graph pipeline to extract data from pages based on natural-language prompts. |
| Colly | ★ 25.3k | A Go scraping framework for building fast crawlers with request handling, callbacks, and rate limiting. |
| Crawlee | ★ 23.8k | A Node.js/TypeScript scraping library with proxy rotation and browser fingerprinting for building reliable crawlers. |
| Katana | ★ 17.1k | A fast Go crawler that maps every URL, endpoint, and JS file on a target site |