Overview
Firecrawl is a crawling and scraping service with an API that converts websites into clean Markdown, structured JSON, HTML, or screenshots. It handles the parts of web scraping that usually break: JavaScript-heavy pages, rotating proxies, rate limits, and orchestration, so you get usable data back without managing that infrastructure yourself.
It is aimed at developers building AI applications and agents who need web content in a form a language model can read. Instead of writing custom parsers for every site, you call a single endpoint and get LLM-ready output. Official SDKs are available for Python and Node.js, plus a CLI and direct HTTP access.
Within the web scraping and crawling category, Firecrawl covers the full path from finding sources to extracting content: search the web, scrape one URL, crawl an entire site, map all of a site's URLs, batch-scrape thousands of pages, and interact with pages before pulling content. It is open source and also offered as a hosted service.
What it does
- Scrape any URL into Markdown, HTML, structured JSON, or a screenshot with one request
- Crawl an entire website's URLs, or use Map to discover all of a site's links instantly
- Search the web and get the full page content from each result
- Batch-scrape thousands of URLs asynchronously
- Parse content from web-hosted files such as PDFs and DOCX
- Run actions like click, scroll, write, wait, and press on a page before extracting content
Getting started
Sign up at firecrawl.dev to get an API key, install an SDK, then make your first scrape call. The examples below use Python; equivalent Node.js, cURL, and CLI usage is shown in the README.
Install the SDK
Install the Python client with pip, or the Node.js client with npm.
pip install firecrawl-pyScrape a single page
Create a client with your API key and scrape any URL to get LLM-ready output.
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
result = app.scrape('firecrawl.dev')Search the web
Run a web search and get the full page content from the top results.
search_result = app.search("firecrawl", limit=5)Use the CLI instead
If you prefer the command line, the CLI exposes the same scrape and search commands.
firecrawl scrape https://firecrawl.dev
firecrawl search "firecrawl" --limit 5Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Feed clean, up-to-date web content into a RAG pipeline or LLM prompt without writing per-site parsers
- Give an AI agent the ability to search, read, and act on live web pages
- Crawl a documentation site or knowledge base and convert every page to Markdown for indexing
- Batch-scrape large lists of URLs and extract structured JSON fields from each page
How Firecrawl compares
Firecrawl alongside other open-source web scraping & crawling tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Firecrawl | ★ 135k | Turn any website into clean Markdown or structured JSON for your LLM apps |
| Crawl4AI | ★ 68.9k | A local-first Python web crawler that turns pages into clean Markdown for use in RAG and LLM pipelines. |
| Scrapling | ★ 65k | A Python web scraping framework whose parser relocates your elements when pages change, with stealthy fetchers and a Scrapy-like spider engine for full crawls. |
| Scrapy | ★ 62.3k | A mature Python framework for writing fast spiders that crawl websites and extract structured data at scale. |
| ScrapeGraphAI | ★ 27.4k | A Python library that uses LLMs and a graph pipeline to extract data from pages based on natural-language prompts. |
| Colly | ★ 25.3k | A Go scraping framework for building fast crawlers with request handling, callbacks, and rate limiting. |
| Crawlee | ★ 23.8k | A Node.js/TypeScript scraping library with proxy rotation and browser fingerprinting for building reliable crawlers. |
| Katana | ★ 17.1k | A fast Go command-line crawler that discovers every URL, endpoint, and JavaScript file on a target site. |