AI/TLDR

Colly

A fast, clean scraping framework for building web crawlers in Go

Overview

Colly is a scraping framework for the Go programming language. It gives you a clean API to write crawlers, scrapers, and spiders that pull structured data out of websites.

It is aimed at Go developers who need to collect data at scale for jobs like data mining, data processing, or archiving. You attach callbacks to events such as a request being made or an HTML element being found, and Colly handles the crawling loop for you.

As a web scraping and crawling tool, Colly covers the common needs of production crawlers: per-domain request delays and concurrency limits, automatic cookie and session handling, caching, robots.txt support, and synchronous, asynchronous, or parallel scraping.

What it does

  • Clean API for writing crawlers, scrapers, and spiders
  • Fast: over 1k requests per second on a single core
  • Manages request delays and maximum concurrency per domain
  • Automatic cookie and session handling
  • Sync, async, and parallel scraping modes
  • Caching, robots.txt support, and distributed scraping

Getting started

Add Colly to a Go module, then register callbacks on a collector and start visiting pages.

Install Colly

Fetch the v2 module with go get inside your Go project.

bashbash
go get github.com/gocolly/colly/v2

Crawl a page and follow its links

Create a collector, attach an OnHTML callback to follow links and an OnRequest callback to log each visit, then call Visit to start.

gogo
import (
	"fmt"

	"github.com/gocolly/colly/v2"
)

func main() {
	c := colly.NewCollector()

	// Find and visit all links
	c.OnHTML("a[href]", func(e *colly.HTMLElement) {
		e.Request.Visit(e.Attr("href"))
	})

	c.OnRequest(func(r *colly.Request) {
		fmt.Println("Visiting", r.URL)
	})

	c.Visit("http://go-colly.org/")
}

Explore more examples

See the _examples folder in the repository for more detailed, runnable scrapers.

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Crawling a website and following its internal links to map or archive its pages
  • Extracting structured data from HTML for data mining or processing pipelines
  • Running large-scale collection with per-domain rate limits and concurrency control
  • Building async or distributed scrapers that handle cookies and caching automatically

How Colly compares

Colly alongside other open-source web scraping & crawling tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Firecrawl★ 135kA crawling service and API that converts whole websites into clean Markdown or structured JSON ready for LLMs.
Crawl4AI★ 68.9kA local-first Python web crawler that turns pages into clean Markdown for use in RAG and LLM pipelines.
Scrapling★ 65kA Python web scraping framework whose parser relocates your elements when pages change, with stealthy fetchers and a Scrapy-like spider engine for full crawls.
Scrapy★ 62.3kA mature Python framework for writing fast spiders that crawl websites and extract structured data at scale.
ScrapeGraphAI★ 27.4kA Python library that uses LLMs and a graph pipeline to extract data from pages based on natural-language prompts.
Colly★ 25.3kA fast, clean scraping framework for building web crawlers in Go
Crawlee★ 23.8kA Node.js/TypeScript scraping library with proxy rotation and browser fingerprinting for building reliable crawlers.
Katana★ 17.1kA fast Go command-line crawler that discovers every URL, endpoint, and JavaScript file on a target site.