AI/TLDR

Scrapling

Adaptive Python web scraping that survives site changes and bypasses anti-bot systems

Overview

Scrapling is an open-source Python framework for web scraping that handles everything from a single request to a full-scale crawl. Its parser learns from website changes and automatically relocates your elements when a page's structure updates, so selectors that would normally break keep working.

On top of the parser, Scrapling ships fetchers that send requests under the radar and can bypass anti-bot systems like Cloudflare's Turnstile out of the box. A Scrapy-like spider framework lets you scale up to concurrent, multi-session crawls with pause and resume, streaming results, and built-in proxy rotation, all in a few lines of Python.

What it does

  • Adaptive parser: mark elements with auto_save, then pass adaptive=True later to re-find them after a site's design changes.
  • Stealthy fetchers (Fetcher, StealthyFetcher, DynamicFetcher) that spoof browser fingerprints and bypass Cloudflare Turnstile and similar anti-bot checks.
  • Scrapy-like Spider API with start_urls, async parse callbacks, and Request/Response objects for full crawls.
  • Concurrent crawling with configurable concurrency limits, per-domain throttling, download delays, and built-in proxy rotation.
  • Checkpoint-based pause and resume: stop a crawl with Ctrl+C and restart to continue from where it left off.
  • Streaming mode, optional robots.txt compliance, a development cache mode, and built-in JSON/JSONL export.

Getting started

Scrapling requires Python 3.10 or higher. The base install includes only the parser engine; add the fetchers extra and run the install command to get browsers and stealth dependencies for the fetchers and spiders.

Install the parser

Install the core package from PyPI. This gives you the parser engine on its own, without the fetchers or browser dependencies.

bashbash
pip install scrapling

Add fetchers and browsers

To use any fetcher or spider, install the fetchers extra and then run the install command to download the browsers and their system and fingerprint dependencies.

bashbash
pip install "scrapling[fetchers]"
scrapling install

Fetch a page and scrape it

Use a fetcher to load a page, then select elements with CSS. Save them with auto_save, and pass adaptive=True later so Scrapling can re-find them if the site's structure changes.

pythonpython
from scrapling.fetchers import StealthyFetcher

StealthyFetcher.adaptive = True
p = StealthyFetcher.fetch('https://example.com', headless=True, network_idle=True)
products = p.css('.product', auto_save=True)
products = p.css('.product', adaptive=True)

Scale up to a full crawl

Define a spider with start URLs and an async parse callback, yielding the data you want from each response, then start the crawl.

pythonpython
from scrapling.spiders import Spider, Response

class MySpider(Spider):
    name = "demo"
    start_urls = ["https://example.com/"]

    async def parse(self, response: Response):
        for item in response.css('.product'):
            yield {"title": item.css('h2::text').get()}

MySpider().start()

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Scraping data from sites that frequently redesign their pages, where a self-healing parser keeps your selectors working instead of breaking.
  • Collecting data from sites protected by anti-bot systems such as Cloudflare Turnstile, using the stealthy fetchers.
  • Running large concurrent crawls across many pages with proxy rotation, pause/resume, and streaming results for pipelines or dashboards.

How Scrapling compares

Scrapling alongside other open-source web scraping & crawling tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Firecrawl★ 135kA crawling service and API that converts whole websites into clean Markdown or structured JSON ready for LLMs.
Crawl4AI★ 68.9kA local-first Python web crawler that turns pages into clean Markdown for use in RAG and LLM pipelines.
Scrapling★ 65kAdaptive Python web scraping that survives site changes and bypasses anti-bot systems
Scrapy★ 62.3kA mature Python framework for writing fast spiders that crawl websites and extract structured data at scale.
ScrapeGraphAI★ 27.4kA Python library that uses LLMs and a graph pipeline to extract data from pages based on natural-language prompts.
Colly★ 25.3kA Go scraping framework for building fast crawlers with request handling, callbacks, and rate limiting.
Crawlee★ 23.8kA Node.js/TypeScript scraping library with proxy rotation and browser fingerprinting for building reliable crawlers.
Katana★ 17.1kA fast Go command-line crawler that discovers every URL, endpoint, and JavaScript file on a target site.