AI/TLDR

Katana

A fast Go crawler that maps every URL, endpoint, and JS file on a target site

Overview

Katana is a command-line crawling and spidering framework written in Go, built by ProjectDiscovery. You point it at a URL or a list of URLs, and it walks the site to discover every reachable URL, endpoint, and JavaScript file.

It is aimed at security testers, bug bounty hunters, and engineers who need to map a site's attack surface or build automation pipelines. Katana runs in a standard HTTP mode or a headless (real browser) mode, and can parse JavaScript files to find endpoints that a plain crawler would miss.

Within the web scraping and crawling category, Katana focuses on speed and scriptability. It reads input from STDIN, a single URL, or a file list, and writes output to STDOUT, a file, or JSON, so it fits cleanly into shell pipelines alongside other tools.

What it does

  • Standard and headless (real browser) crawling modes
  • JavaScript parsing and crawling to surface hidden endpoints
  • Scope control via preconfigured fields or custom regex
  • Customizable automatic form filling (experimental)
  • Flexible input (STDIN, URL, list) and output (STDOUT, file, JSON)
  • Configurable crawl depth, duration, and per-domain page limits

Getting started

Katana requires Go 1.26+ to install from source, or you can pull a prebuilt Docker image. Once installed, you can crawl a target with a single command.

Install with Go

Install the latest Katana binary using the Go toolchain. CGO must be enabled.

bashbash
CGO_ENABLED=1 go install github.com/projectdiscovery/katana/cmd/katana@latest

Or pull the Docker image

If you prefer not to build from source, pull the official image.

bashbash
docker pull projectdiscovery/katana:latest

Crawl a target

Pass a URL with -u to start a standard crawl. Add -headless for browser-based crawling.

bashbash
katana -u https://tesla.com

Check the available flags

Run the help command to see all supported switches, including depth, JS crawling, and scope control.

bashbash
katana -h

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Mapping a web application's URLs and endpoints during a security assessment or bug bounty
  • Extracting endpoints from JavaScript files that a basic crawler would miss
  • Feeding discovered URLs into automation pipelines via STDIN/STDOUT and JSON output
  • Crawling single-page apps and JS-heavy sites in headless mode with a real browser

How Katana compares

Katana alongside other open-source web scraping & crawling tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Firecrawl★ 135kA crawling service and API that converts whole websites into clean Markdown or structured JSON ready for LLMs.
Crawl4AI★ 68.9kA local-first Python web crawler that turns pages into clean Markdown for use in RAG and LLM pipelines.
Scrapling★ 65kA Python web scraping framework whose parser relocates your elements when pages change, with stealthy fetchers and a Scrapy-like spider engine for full crawls.
Scrapy★ 62.3kA mature Python framework for writing fast spiders that crawl websites and extract structured data at scale.
ScrapeGraphAI★ 27.4kA Python library that uses LLMs and a graph pipeline to extract data from pages based on natural-language prompts.
Colly★ 25.3kA Go scraping framework for building fast crawlers with request handling, callbacks, and rate limiting.
Crawlee★ 23.8kA Node.js/TypeScript scraping library with proxy rotation and browser fingerprinting for building reliable crawlers.
Katana★ 17.1kA fast Go crawler that maps every URL, endpoint, and JS file on a target site