Overview
Colly is a scraping framework for the Go programming language. It gives you a clean API to write crawlers, scrapers, and spiders that pull structured data out of websites.
It is aimed at Go developers who need to collect data at scale for jobs like data mining, data processing, or archiving. You attach callbacks to events such as a request being made or an HTML element being found, and Colly handles the crawling loop for you.
As a web scraping and crawling tool, Colly covers the common needs of production crawlers: per-domain request delays and concurrency limits, automatic cookie and session handling, caching, robots.txt support, and synchronous, asynchronous, or parallel scraping.
What it does
- Clean API for writing crawlers, scrapers, and spiders
- Fast: over 1k requests per second on a single core
- Manages request delays and maximum concurrency per domain
- Automatic cookie and session handling
- Sync, async, and parallel scraping modes
- Caching, robots.txt support, and distributed scraping
Getting started
Add Colly to a Go module, then register callbacks on a collector and start visiting pages.
Install Colly
Fetch the v2 module with go get inside your Go project.
go get github.com/gocolly/colly/v2Crawl a page and follow its links
Create a collector, attach an OnHTML callback to follow links and an OnRequest callback to log each visit, then call Visit to start.
import (
"fmt"
"github.com/gocolly/colly/v2"
)
func main() {
c := colly.NewCollector()
// Find and visit all links
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
e.Request.Visit(e.Attr("href"))
})
c.OnRequest(func(r *colly.Request) {
fmt.Println("Visiting", r.URL)
})
c.Visit("http://go-colly.org/")
}Explore more examples
See the _examples folder in the repository for more detailed, runnable scrapers.
Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Crawling a website and following its internal links to map or archive its pages
- Extracting structured data from HTML for data mining or processing pipelines
- Running large-scale collection with per-domain rate limits and concurrency control
- Building async or distributed scrapers that handle cookies and caching automatically
How Colly compares
Colly alongside other open-source web scraping & crawling tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Firecrawl | ★ 135k | A crawling service and API that converts whole websites into clean Markdown or structured JSON ready for LLMs. |
| Crawl4AI | ★ 68.9k | A local-first Python web crawler that turns pages into clean Markdown for use in RAG and LLM pipelines. |
| Scrapling | ★ 65k | A Python web scraping framework whose parser relocates your elements when pages change, with stealthy fetchers and a Scrapy-like spider engine for full crawls. |
| Scrapy | ★ 62.3k | A mature Python framework for writing fast spiders that crawl websites and extract structured data at scale. |
| ScrapeGraphAI | ★ 27.4k | A Python library that uses LLMs and a graph pipeline to extract data from pages based on natural-language prompts. |
| Colly | ★ 25.3k | A fast, clean scraping framework for building web crawlers in Go |
| Crawlee | ★ 23.8k | A Node.js/TypeScript scraping library with proxy rotation and browser fingerprinting for building reliable crawlers. |
| Katana | ★ 17.1k | A fast Go command-line crawler that discovers every URL, endpoint, and JavaScript file on a target site. |