Overview
Scrapling is an open-source Python framework for web scraping that handles everything from a single request to a full-scale crawl. Its parser learns from website changes and automatically relocates your elements when a page's structure updates, so selectors that would normally break keep working.
On top of the parser, Scrapling ships fetchers that send requests under the radar and can bypass anti-bot systems like Cloudflare's Turnstile out of the box. A Scrapy-like spider framework lets you scale up to concurrent, multi-session crawls with pause and resume, streaming results, and built-in proxy rotation, all in a few lines of Python.
What it does
- Adaptive parser: mark elements with auto_save, then pass adaptive=True later to re-find them after a site's design changes.
- Stealthy fetchers (Fetcher, StealthyFetcher, DynamicFetcher) that spoof browser fingerprints and bypass Cloudflare Turnstile and similar anti-bot checks.
- Scrapy-like Spider API with start_urls, async parse callbacks, and Request/Response objects for full crawls.
- Concurrent crawling with configurable concurrency limits, per-domain throttling, download delays, and built-in proxy rotation.
- Checkpoint-based pause and resume: stop a crawl with Ctrl+C and restart to continue from where it left off.
- Streaming mode, optional robots.txt compliance, a development cache mode, and built-in JSON/JSONL export.
Getting started
Scrapling requires Python 3.10 or higher. The base install includes only the parser engine; add the fetchers extra and run the install command to get browsers and stealth dependencies for the fetchers and spiders.
Install the parser
Install the core package from PyPI. This gives you the parser engine on its own, without the fetchers or browser dependencies.
pip install scraplingAdd fetchers and browsers
To use any fetcher or spider, install the fetchers extra and then run the install command to download the browsers and their system and fingerprint dependencies.
pip install "scrapling[fetchers]"
scrapling installFetch a page and scrape it
Use a fetcher to load a page, then select elements with CSS. Save them with auto_save, and pass adaptive=True later so Scrapling can re-find them if the site's structure changes.
from scrapling.fetchers import StealthyFetcher
StealthyFetcher.adaptive = True
p = StealthyFetcher.fetch('https://example.com', headless=True, network_idle=True)
products = p.css('.product', auto_save=True)
products = p.css('.product', adaptive=True)Scale up to a full crawl
Define a spider with start URLs and an async parse callback, yielding the data you want from each response, then start the crawl.
from scrapling.spiders import Spider, Response
class MySpider(Spider):
name = "demo"
start_urls = ["https://example.com/"]
async def parse(self, response: Response):
for item in response.css('.product'):
yield {"title": item.css('h2::text').get()}
MySpider().start()Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Scraping data from sites that frequently redesign their pages, where a self-healing parser keeps your selectors working instead of breaking.
- Collecting data from sites protected by anti-bot systems such as Cloudflare Turnstile, using the stealthy fetchers.
- Running large concurrent crawls across many pages with proxy rotation, pause/resume, and streaming results for pipelines or dashboards.
How Scrapling compares
Scrapling alongside other open-source web scraping & crawling tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Firecrawl | ★ 135k | A crawling service and API that converts whole websites into clean Markdown or structured JSON ready for LLMs. |
| Crawl4AI | ★ 68.9k | A local-first Python web crawler that turns pages into clean Markdown for use in RAG and LLM pipelines. |
| Scrapling | ★ 65k | Adaptive Python web scraping that survives site changes and bypasses anti-bot systems |
| Scrapy | ★ 62.3k | A mature Python framework for writing fast spiders that crawl websites and extract structured data at scale. |
| ScrapeGraphAI | ★ 27.4k | A Python library that uses LLMs and a graph pipeline to extract data from pages based on natural-language prompts. |
| Colly | ★ 25.3k | A Go scraping framework for building fast crawlers with request handling, callbacks, and rate limiting. |
| Crawlee | ★ 23.8k | A Node.js/TypeScript scraping library with proxy rotation and browser fingerprinting for building reliable crawlers. |
| Katana | ★ 17.1k | A fast Go command-line crawler that discovers every URL, endpoint, and JavaScript file on a target site. |