Firecrawl

Turn any website into clean Markdown or structured JSON for your LLM apps

github.com/firecrawl/firecrawl★ 135k firecrawl.dev

Overview

Firecrawl is a crawling and scraping service with an API that converts websites into clean Markdown, structured JSON, HTML, or screenshots. It handles the parts of web scraping that usually break: JavaScript-heavy pages, rotating proxies, rate limits, and orchestration, so you get usable data back without managing that infrastructure yourself.

It is aimed at developers building AI applications and agents who need web content in a form a language model can read. Instead of writing custom parsers for every site, you call a single endpoint and get LLM-ready output. Official SDKs are available for Python and Node.js, plus a CLI and direct HTTP access.

Within the web scraping and crawling category, Firecrawl covers the full path from finding sources to extracting content: search the web, scrape one URL, crawl an entire site, map all of a site's URLs, batch-scrape thousands of pages, and interact with pages before pulling content. It is open source and also offered as a hosted service.

What it does

Scrape any URL into Markdown, HTML, structured JSON, or a screenshot with one request
Crawl an entire website's URLs, or use Map to discover all of a site's links instantly
Search the web and get the full page content from each result
Batch-scrape thousands of URLs asynchronously
Parse content from web-hosted files such as PDFs and DOCX
Run actions like click, scroll, write, wait, and press on a page before extracting content

Getting started

Sign up at firecrawl.dev to get an API key, install an SDK, then make your first scrape call. The examples below use Python; equivalent Node.js, cURL, and CLI usage is shown in the README.

Install the SDK

Install the Python client with pip, or the Node.js client with npm.

bashbash

pip install firecrawl-py

Scrape a single page

Create a client with your API key and scrape any URL to get LLM-ready output.

pythonpython

from firecrawl import Firecrawl

app = Firecrawl(api_key="fc-YOUR_API_KEY")

result = app.scrape('firecrawl.dev')

Search the web

Run a web search and get the full page content from the top results.

pythonpython

search_result = app.search("firecrawl", limit=5)

Use the CLI instead

If you prefer the command line, the CLI exposes the same scrape and search commands.

bashbash

firecrawl scrape https://firecrawl.dev
firecrawl search "firecrawl" --limit 5

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Feed clean, up-to-date web content into a RAG pipeline or LLM prompt without writing per-site parsers
Give an AI agent the ability to search, read, and act on live web pages
Crawl a documentation site or knowledge base and convert every page to Markdown for indexing
Batch-scrape large lists of URLs and extract structured JSON fields from each page

How Firecrawl compares

Firecrawl alongside other open-source web scraping & crawling tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Firecrawl	★ 135k	Turn any website into clean Markdown or structured JSON for your LLM apps
Crawl4AI	★ 68.9k	A local-first Python web crawler that turns pages into clean Markdown for use in RAG and LLM pipelines.
Scrapling	★ 65k	A Python web scraping framework whose parser relocates your elements when pages change, with stealthy fetchers and a Scrapy-like spider engine for full crawls.
Scrapy	★ 62.3k	A mature Python framework for writing fast spiders that crawl websites and extract structured data at scale.
ScrapeGraphAI	★ 27.4k	A Python library that uses LLMs and a graph pipeline to extract data from pages based on natural-language prompts.
Colly	★ 25.3k	A Go scraping framework for building fast crawlers with request handling, callbacks, and rate limiting.
Crawlee	★ 23.8k	A Node.js/TypeScript scraping library with proxy rotation and browser fingerprinting for building reliable crawlers.
Katana	★ 17.1k	A fast Go command-line crawler that discovers every URL, endpoint, and JavaScript file on a target site.

// Overview

// What it does

// Getting started