Colly

A fast, clean scraping framework for building web crawlers in Go

github.com/gocolly/colly★ 25.3k go-colly.org

Overview

Colly is a scraping framework for the Go programming language. It gives you a clean API to write crawlers, scrapers, and spiders that pull structured data out of websites.

It is aimed at Go developers who need to collect data at scale for jobs like data mining, data processing, or archiving. You attach callbacks to events such as a request being made or an HTML element being found, and Colly handles the crawling loop for you.

As a web scraping and crawling tool, Colly covers the common needs of production crawlers: per-domain request delays and concurrency limits, automatic cookie and session handling, caching, robots.txt support, and synchronous, asynchronous, or parallel scraping.

What it does

Clean API for writing crawlers, scrapers, and spiders
Fast: over 1k requests per second on a single core
Manages request delays and maximum concurrency per domain
Automatic cookie and session handling
Sync, async, and parallel scraping modes
Caching, robots.txt support, and distributed scraping

Getting started

Add Colly to a Go module, then register callbacks on a collector and start visiting pages.

Install Colly

Fetch the v2 module with go get inside your Go project.

bashbash

go get github.com/gocolly/colly/v2

Crawl a page and follow its links

Create a collector, attach an OnHTML callback to follow links and an OnRequest callback to log each visit, then call Visit to start.

gogo

import (
	"fmt"

	"github.com/gocolly/colly/v2"
)

func main() {
	c := colly.NewCollector()

	// Find and visit all links
	c.OnHTML("a[href]", func(e *colly.HTMLElement) {
		e.Request.Visit(e.Attr("href"))
	})

	c.OnRequest(func(r *colly.Request) {
		fmt.Println("Visiting", r.URL)
	})

	c.Visit("http://go-colly.org/")
}

Explore more examples

See the _examples folder in the repository for more detailed, runnable scrapers.

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Crawling a website and following its internal links to map or archive its pages
Extracting structured data from HTML for data mining or processing pipelines
Running large-scale collection with per-domain rate limits and concurrency control
Building async or distributed scrapers that handle cookies and caching automatically

How Colly compares

Colly alongside other open-source web scraping & crawling tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Firecrawl	★ 135k	A crawling service and API that converts whole websites into clean Markdown or structured JSON ready for LLMs.
Crawl4AI	★ 68.9k	A local-first Python web crawler that turns pages into clean Markdown for use in RAG and LLM pipelines.
Scrapling	★ 65k	A Python web scraping framework whose parser relocates your elements when pages change, with stealthy fetchers and a Scrapy-like spider engine for full crawls.
Scrapy	★ 62.3k	A mature Python framework for writing fast spiders that crawl websites and extract structured data at scale.
ScrapeGraphAI	★ 27.4k	A Python library that uses LLMs and a graph pipeline to extract data from pages based on natural-language prompts.
Colly	★ 25.3k	A fast, clean scraping framework for building web crawlers in Go
Crawlee	★ 23.8k	A Node.js/TypeScript scraping library with proxy rotation and browser fingerprinting for building reliable crawlers.
Katana	★ 17.1k	A fast Go command-line crawler that discovers every URL, endpoint, and JavaScript file on a target site.

// Overview

// What it does

// Getting started