AI/TLDR

Crawlee

A Node.js library for web scraping and browser automation that builds reliable crawlers

Overview

Crawlee is an open-source Node.js and TypeScript library for web scraping and browser automation. It gives you one interface for both plain HTTP crawling and headless browser crawling, so you can pull links, scrape data, and store results to disk or the cloud from a single codebase.

It is aimed at developers who build crawlers and need them to keep working against modern bot protection. Out of the box your crawlers generate browser-like headers and human-like fingerprints, rotate proxies, and manage sessions, which helps them blend in without a lot of manual tuning.

Within the web scraping and crawling category, Crawlee covers the full job end to end: a persistent request queue, pluggable storage for tabular data and files, automatic scaling to system resources, plus configurable routing, error handling, and retries. It is built and maintained by Apify.

What it does

  • Single interface for both HTTP and headless browser crawling
  • Integrated proxy rotation and session management
  • Browser-like headers and human-like fingerprints, including replicated TLS fingerprints
  • Persistent URL queue (breadth- and depth-first) plus pluggable storage for data and files
  • Use Playwright or Puppeteer with the same API across Chrome, Firefox, and WebKit
  • Automatic scaling, configurable routing, error handling, and retries; written in TypeScript

Getting started

Crawlee requires Node.js 16 or higher. The fastest way to start is the Crawlee CLI, which scaffolds a project; you can also add Crawlee to an existing project manually.

Scaffold a project with the CLI

Run the Crawlee CLI and pick the getting-started example. It installs the dependencies and adds boilerplate code for you.

bashbash
npx crawlee create my-crawler
cd my-crawler
npm start

Or install into your own project

Install Crawlee alongside Playwright, since the browser crawler needs it and it is not bundled in order to keep the install size down.

bashbash
npm install crawlee playwright

Write a minimal crawler

Create a PlaywrightCrawler that reads each page's title, saves it to a dataset, and follows links found on the page.

jsjs
import { PlaywrightCrawler, Dataset } from 'crawlee';

const crawler = new PlaywrightCrawler({
    async requestHandler({ request, page, enqueueLinks, log }) {
        const title = await page.title();
        log.info(`Title of ${request.loadedUrl} is '${title}'`);

        // Save results as JSON to ./storage/datasets/default
        await Dataset.pushData({ title, url: request.loadedUrl });

        // Extract links and add them to the crawling queue.
        await enqueueLinks();
    },
});

await crawler.run(['https://crawlee.dev']);

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Scrape JavaScript-heavy sites that need a real browser to render content before extraction
  • Crawl and extract data from static HTML pages or JSON APIs using fast HTTP crawling
  • Build crawlers that need proxy rotation and human-like fingerprints to avoid bot blocks
  • Collect data into datasets and deploy the crawler with the provided Dockerfiles

How Crawlee compares

Crawlee alongside other open-source web scraping & crawling tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Firecrawl★ 135kA crawling service and API that converts whole websites into clean Markdown or structured JSON ready for LLMs.
Crawl4AI★ 68.9kA local-first Python web crawler that turns pages into clean Markdown for use in RAG and LLM pipelines.
Scrapling★ 65kA Python web scraping framework whose parser relocates your elements when pages change, with stealthy fetchers and a Scrapy-like spider engine for full crawls.
Scrapy★ 62.3kA mature Python framework for writing fast spiders that crawl websites and extract structured data at scale.
ScrapeGraphAI★ 27.4kA Python library that uses LLMs and a graph pipeline to extract data from pages based on natural-language prompts.
Colly★ 25.3kA Go scraping framework for building fast crawlers with request handling, callbacks, and rate limiting.
Crawlee★ 23.8kA Node.js library for web scraping and browser automation that builds reliable crawlers
Katana★ 17.1kA fast Go command-line crawler that discovers every URL, endpoint, and JavaScript file on a target site.