Scrapy

A Python framework for crawling websites and extracting structured data at scale

github.com/scrapy/scrapy★ 62.3k scrapy.org

Overview

Scrapy is a Python framework for building web crawlers, called spiders, that visit pages, follow links, and pull out structured data such as text, prices, or listings. It runs on Linux, macOS, and Windows and requires Python 3.10 or newer.

It is aimed at developers who need to collect data from many pages reliably rather than write one-off scripts. You define a spider class that says which URLs to start from and how to parse each response, and Scrapy handles requests, scheduling, retries, and concurrency for you.

Within the web scraping and crawling space, Scrapy sits at the full-framework end: instead of stitching together an HTTP client and an HTML parser yourself, you get a project structure, CSS and XPath selectors, item pipelines, and export to JSON, CSV, or other formats. It is maintained by Zyte and many community contributors.

What it does

Define spiders as Python classes with start URLs and a parse method that yields data or follows links
Extract data with built-in CSS and XPath selectors on each response
Handles requests, scheduling, retries, and concurrent crawling so you don't manage them by hand
Item pipelines for cleaning, validating, and storing scraped data
Export results to JSON, CSV, and other formats out of the box
Cross-platform support on Linux, macOS, and Windows with Python 3.10+

Getting started

Install Scrapy from PyPI, scaffold a project, write a small spider, then run it.

Install Scrapy

Install the package from PyPI. Scrapy requires Python 3.10 or newer.

bashbash

pip install scrapy

Create a project

Scaffold a new Scrapy project to get the standard folder layout for spiders and settings.

bashbash

scrapy startproject tutorial

Write a spider

Add a spider class that lists start URLs and parses each response. Save it under the project's spiders folder.

pythonpython

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        "https://quotes.toscrape.com/page/1/",
    ]

    def parse(self, response):
        for quote in response.css("div.quote"):
            yield {
                "text": quote.css("span.text::text").get(),
                "author": quote.css("small.author::text").get(),
                "tags": quote.css("div.tags a.tag::text").getall(),
            }

Run the spider

Run the spider by its name. Scrapy fetches the pages and prints the extracted items in the log.

bashbash

scrapy crawl quotes

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Collect product listings, prices, or reviews across many pages of an e-commerce site
Build a dataset from public web pages for analysis or machine learning
Crawl a site by following links to gather structured records at scale
Run scheduled scraping jobs that export results to JSON or CSV for a data pipeline

How Scrapy compares

Scrapy alongside other open-source web scraping & crawling tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Firecrawl	★ 135k	A crawling service and API that converts whole websites into clean Markdown or structured JSON ready for LLMs.
Crawl4AI	★ 68.9k	A local-first Python web crawler that turns pages into clean Markdown for use in RAG and LLM pipelines.
Scrapling	★ 65k	A Python web scraping framework whose parser relocates your elements when pages change, with stealthy fetchers and a Scrapy-like spider engine for full crawls.
Scrapy	★ 62.3k	A Python framework for crawling websites and extracting structured data at scale
ScrapeGraphAI	★ 27.4k	A Python library that uses LLMs and a graph pipeline to extract data from pages based on natural-language prompts.
Colly	★ 25.3k	A Go scraping framework for building fast crawlers with request handling, callbacks, and rate limiting.
Crawlee	★ 23.8k	A Node.js/TypeScript scraping library with proxy rotation and browser fingerprinting for building reliable crawlers.
Katana	★ 17.1k	A fast Go command-line crawler that discovers every URL, endpoint, and JavaScript file on a target site.

// Overview

// What it does

// Getting started

Install Scrapy

Create a project

Write a spider

Run the spider

// When to use it

// How Scrapy compares

Overview

What it does

Getting started

When to use it

How Scrapy compares