How to Send PDFs to an LLM API (Document Inputs Explained)

Q: What is the difference between the Files API and sending a base64-encoded PDF?

With **base64 inline**, the PDF bytes travel with every API request — convenient for one-off calls but expensive for repeated queries because you pay the encoding overhead each time. With the **Files API**, you upload the PDF once, get a persistent `file_id`, and reference that ID in as many requests as you like. The Files API also lets you stay well below request-body size limits for large documents.

Q: How many pages can I send to the API at once?

Claude and OpenAI both cap PDF inputs at **100 pages per request**. Google Gemini supports up to **1,000 pages** inline or via the Files API. If your document exceeds these limits, split it into chunks by page range before sending, or pre-extract the text.

Q: Why am I getting a 400 error when I send a PDF to the Claude API?

The two most common causes are (1) using the **wrong block type** — PDFs need a `document` block, not an `image` block — and (2) a **missing beta header** when using the Files API. If you are referencing a `file_id`, you must call `client.beta.messages.create()` and include `betas=["files-api-2025-04-14"]`. Also check that the total request body (prompt + encoded PDF) does not exceed 32 MB.

Learn every way to get a PDF into a model — native document inputs, Files APIs, and text extraction — and when each one is the right call.

INTERMEDIATE14 MIN READUPDATED 2026-06-12

In plain English

A year ago, sending a PDF to a language model meant first converting it to text yourself — stripping out the formatting, losing the tables, and hoping the model could make sense of the garbled output. Today, the three biggest API providers — Anthropic Claude, OpenAI GPT, and Google Gemini — all let you attach a PDF directly to a chat message, the same way you'd attach a file to an email. The model reads both the text and the visual layout, then replies in plain language.

Send PDFs to an LLM API — diagram — Send PDFs to an LLM API — medium.com

Think of it like handing a physical report to a very fast reader. When you give them the real document — not a blurry photocopy of just the words — they can see the tables, the charts, the column headers, and the footnotes. The model does the same thing: it processes each page both as extracted text and as an image, so nothing meaningful is lost in translation.

There are three distinct ways to get a PDF into a model call: inline base64 (encode the bytes and paste them into the request JSON), Files API (upload once, reference by ID in many requests), and text extraction (parse the PDF yourself, send the resulting text). Each approach has a different cost profile, latency characteristic, and quality ceiling — and the right choice depends on your document type and usage pattern.

Why it matters

PDFs are the dominant format for contracts, invoices, research papers, financial filings, technical manuals, and regulatory documents. Before native PDF support, every document-processing pipeline needed a separate extraction layer — pdfplumber, PyMuPDF, Apache PDFBox, or a third-party OCR service — that added latency, cost, and failure modes of its own. Today that layer is optional, and removing it often improves quality.

Contract review: attach a 60-page agreement and ask the model to flag non-standard clauses — no pre-processing required.
Invoice extraction: upload a scanned PDF invoice and receive structured JSON with line items, amounts, and dates.
Research Q&A: send a technical paper and ask follow-up questions page by page, with the model citing page numbers.
Compliance checking: compare a multi-section regulation PDF against a product description and identify gaps.
Report summarisation: feed a quarterly earnings PDF to a model and get a bullet-point summary in seconds.

Getting the input format right is the critical first step. An incorrectly encoded payload returns a cryptic 400 error; a missing beta header on the Claude Files API silently fails; an oversized inline payload trips a request-body limit that is separate from the per-file limit. This guide prevents all three.

How it works

When a provider receives a PDF, it does two things simultaneously: it extracts the raw text layer (if the PDF has one) and it renders each page as an image. Both representations are tokenised and fed into the model together, which is why native PDF input handles scanned documents and PDFs with embedded charts far better than text-only extraction.

// PDF → API → Model → Response

Your PDF fileLocal disk, S3, or URLChoose delivery methodInline base64, Files API, or URLAPI requestJSON with document block + text promptProvider processes PDFExtracts text layer + renders page imagesDual tokenisationText tokens + visual patch tokens mergedModel inferenceReads text, layout, charts, and tables togetherText responseReturned in the same chat message format

The three delivery methods compared

Method	How it works	Best for	Watch out for
Inline base64	PDF bytes encoded as a base64 string, sent inside the JSON request body	One-off calls, local files, no hosting needed	Inflates payload ~33%; 32 MB cap on Claude and OpenAI
Files API (upload once)	Upload the PDF to the provider's storage, receive a file_id, reference it in any future request	Reusing the same document across many calls; multi-turn chats	Extra upload step; beta header required for Claude; files expire
URL input (OpenAI / Gemini)	Pass a publicly accessible HTTPS URL; provider fetches the PDF at request time	PDFs already hosted on a CDN or public storage bucket	URL must be reachable from provider servers; secrets in URLs are exposed
Text extraction (DIY)	Parse the PDF yourself with a library, send the resulting text as a normal string	Text-heavy docs where layout doesn't matter; lowest cost	Loses charts, tables, and scanned pages; adds pipeline complexity

Provider limits at a glance

Provider	Max pages	Max file size	Delivery options
Anthropic Claude	100 pages	32 MB per request	Inline base64, Files API (file_id)
OpenAI GPT	100 pages	32 MB total across files	Inline base64, Files API (file_id), HTTPS URL
Google Gemini	1,000 pages	50 MB inline; 2 GB via Files API	Inline base64, Files API, GCS URL, HTTPS URL

Code examples: Claude, GPT, and Gemini

Anthropic Claude — inline base64

Claude's Messages API accepts a document content block. Set source.type to "base64", source.media_type to "application/pdf", and supply the base64-encoded bytes in source.data. No special beta header is needed for inline PDF — that is generally available.

pythonpython

import anthropic
import base64

with open("contract.pdf", "rb") as f:
    pdf_data = base64.standard_b64encode(f.read()).decode("utf-8")

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env

response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=2048,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data,
                    },
                },
                {
                    "type": "text",
                    "text": "Summarise the key obligations in this contract.",
                },
            ],
        }
    ],
)
print(response.content[0].text)

Anthropic Claude — Files API (upload once, reuse)

For PDFs you query repeatedly — a product manual, a legal handbook, a reference specification — upload once with client.beta.files.upload() and reuse the file_id across as many calls as you like. You must include the anthropic-beta: files-api-2025-04-14 header, which the SDK passes automatically when you use the beta.files namespace.

pythonpython

import anthropic

client = anthropic.Anthropic()

# Upload once
with open("technical-manual.pdf", "rb") as f:
    uploaded = client.beta.files.upload(
        file=("technical-manual.pdf", f, "application/pdf"),
    )
file_id = uploaded.id  # e.g. "file_abc123"

# Reuse in every subsequent call
response = client.beta.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "file",
                        "file_id": file_id,
                    },
                },
                {
                    "type": "text",
                    "text": "What does section 4 say about warranty exclusions?",
                },
            ],
        }
    ],
    betas=["files-api-2025-04-14"],
)
print(response.content[0].text)

OpenAI GPT — Responses API with file_id

OpenAI supports PDF inputs in both the Chat Completions API and the newer Responses API. Upload with client.files.create(purpose="user_data"), then reference the returned file_id in the request. You can also send a base64 data URI or a public HTTPS URL directly.

pythonpython

import openai

client = openai.OpenAI()  # reads OPENAI_API_KEY from env

# Upload once
with open("earnings-report.pdf", "rb") as f:
    uploaded = client.files.create(file=f, purpose="user_data")
file_id = uploaded.id  # e.g. "file-abc123"

# Reference in a Responses API call
response = client.responses.create(
    model="gpt-5.5",
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_file",
                    "file_id": file_id,
                },
                {
                    "type": "input_text",
                    "text": "What was the net revenue for Q3?",
                },
            ],
        }
    ],
)
print(response.output_text)

Google Gemini — inline base64 and Files API

Gemini accepts PDFs either inline (base64 with mime_type: "application/pdf") or via the Files API for larger documents. The inline limit is 50 MB; for larger PDFs use client.files.upload(). Gemini also accepts direct HTTPS URLs and Google Cloud Storage (gs://) URIs.

pythonpython

import google.generativeai as genai
import base64

genai.configure(api_key="YOUR_GEMINI_API_KEY")

# --- Option A: inline base64 (up to 50 MB) ---
with open("research-paper.pdf", "rb") as f:
    pdf_bytes = f.read()

model = genai.GenerativeModel("gemini-3.1-pro")
response = model.generate_content([
    {
        "inline_data": {
            "mime_type": "application/pdf",
            "data": base64.b64encode(pdf_bytes).decode(),
        }
    },
    "List the three main findings of this paper.",
])
print(response.text)

# --- Option B: Files API (for larger or reused PDFs) ---
uploaded = genai.upload_file("large-report.pdf", mime_type="application/pdf")
response2 = model.generate_content([
    uploaded,
    "Summarise the executive summary section.",
])
print(response2.text)

Native PDF input vs. text extraction: when to use each

Native PDF input is not always the right call. Sending a PDF activates the model's visual pipeline, which reads every page as an image. That is powerful for complex layouts, but it costs more tokens than sending clean text, and it runs slower. For high-volume pipelines or simple text-heavy documents, pre-extracting the text with a library like pdfplumber, PyMuPDF (fitz), or Microsoft's open-source markitdown can cut costs by 50% or more while delivering equally good results.

Scenario	Recommended approach	Reason
Scanned PDF (image-only, no text layer)	Native PDF input	No text to extract; model must use vision
PDF with complex tables, charts, or mixed layout	Native PDF input	Visual pipeline preserves layout context that text strips away
Text-heavy PDF with a clean text layer (contract, paper)	Text extraction (pdfplumber / PyMuPDF)	50%+ token savings; extraction quality is equivalent
Same PDF queried many times	Files API upload (any provider)	Upload cost paid once; no repeated encoding overhead
Large PDF > 32 MB / > 100 pages (Claude / OpenAI)	Gemini Files API or chunked text extraction	Claude and OpenAI hard limits; Gemini supports up to 1,000 pages
RAG pipeline with many small chunks	Text extraction + embedding	Chunk boundaries controlled by you; cheaper than per-chunk PDF calls

Token cost reality check

Native PDF input tokenises each page twice — once as text tokens, once as visual patch tokens. A typical dense A4 page runs to roughly 1,500–2,500 tokens in combined form. A 50-page PDF can therefore consume 75,000–125,000 input tokens before you've written a single word of your prompt. Multiply that token count by your model's per-token input price to estimate the cost of a single document query. Extracting the same text with PyMuPDF and sending it as a plain string typically halves that token count.

Common pitfalls and how to avoid them

Missing beta header on Claude Files API. Referencing a file_id without the files-api-2025-04-14 beta header causes a 400. Always use client.beta.messages.create() and pass betas=["files-api-2025-04-14"].
Confusing per-file size with request-body size. Claude and OpenAI both have a 32 MB limit — but that is the request body limit, not a per-file limit. A 24 MB PDF becomes approximately 32 MB after base64 encoding, leaving almost no room for the prompt. Use the Files API for any PDF over ~20 MB.
Password-protected PDFs. Encrypted PDFs that require a password to open will fail silently or return garbage. Decrypt with pikepdf or qpdf before sending.
Sending PDF via image block instead of document block (Claude). Claude requires a document block with media_type: "application/pdf". Using an image block with PDF data is rejected.
Private or expiring URLs. When using URL-based input on OpenAI or Gemini, the provider's servers must be able to fetch the URL at request time. A signed S3 URL that expires in 60 seconds, or a CDN behind IP allowlisting, will return a 403.
Assuming page order is preserved in extraction. When you extract text yourself with some libraries, multi-column layouts scramble reading order. If coherent reading order matters, test your extraction output before assuming it is correct.

Going deeper

Once the basic pipeline is working, several advanced patterns become worth exploring.

Citations and page-level grounding

Anthropic's Citations API (available in Amazon Bedrock and the direct API) lets the model return structured citations that point to the exact passage in the source document it drew from. This is especially valuable for legal, regulatory, and medical use cases where you need an audit trail. Pass citations: {enabled: true} in the request, and each sentence in the response can carry a document_index and start_page_number reference.

Multi-document comparison

All three providers support multiple document blocks in a single message. You can compare two contracts, diff two versions of a specification, or cross-reference a regulation against a product policy in one round-trip. Add each PDF as a separate content block and label them in your prompt (Document A, Document B) so the model can reference them unambiguously. Watch the token budget — two 50-page PDFs at 2,000 tokens per page will cost 200,000 input tokens per query.

Batch processing for high volume

When processing hundreds of PDFs, use the provider's Batch API rather than synchronous calls. Both Anthropic and OpenAI offer batch endpoints that accept a JSONL file of requests, run them asynchronously off-peak, and are billed at roughly half the standard rate. Upload your PDFs to the Files API first (paying the upload cost once), then reference the same file_id across thousands of batch items — you are not re-sending the bytes for each job.

Hybrid pipeline: OCR first, model second

For very large document archives where both cost and latency matter, consider a staged approach: run a lightweight OCR tool (Tesseract, AWS Textract, Google Document AI) to extract text and table structure from PDFs, then send the clean Markdown output to the LLM. The LLM call becomes a straightforward text-in, text-out task — fast and cheap. Reserve native PDF input for documents where the OCR quality is inadequate or where the visual layout is itself meaningful (engineering drawings, complex financial statements).

Structured output extraction from PDFs

Combine native PDF input with a JSON schema prompt to turn any document into a typed data structure. Describe the fields you want — {"invoice_number": "string", "line_items": [{"description": "string", "amount": "number"}], "total": "number"} — instruct the model to return only valid JSON, then pipe the output to json.loads(). For OpenAI, response_format: {type: "json_schema", json_schema: {...}} enforces the schema at the API level, eliminating parse failures entirely.

FAQ

Does Claude's API support sending PDFs directly, or do I have to convert them first?

Claude supports PDFs natively. You can send a PDF as an inline base64 document block without any pre-conversion. For PDFs you query repeatedly, Claude's Files API lets you upload once and reference the file by ID in every subsequent call, which is more efficient.

What is the difference between the Files API and sending a base64-encoded PDF?

With base64 inline, the PDF bytes travel with every API request — convenient for one-off calls but expensive for repeated queries because you pay the encoding overhead each time. With the Files API, you upload the PDF once, get a persistent file_id, and reference that ID in as many requests as you like. The Files API also lets you stay well below request-body size limits for large documents.

How many pages can I send to the API at once?

Claude and OpenAI both cap PDF inputs at 100 pages per request. Google Gemini supports up to 1,000 pages inline or via the Files API. If your document exceeds these limits, split it into chunks by page range before sending, or pre-extract the text.

When should I extract PDF text myself instead of sending the raw PDF?

Text extraction makes sense for documents with a clean embedded text layer — contracts, research papers, plain reports — where the visual layout is not meaningful. It can cut token costs by 50% or more. Stick with native PDF input for scanned documents, PDFs with charts or complex tables, or any case where losing the visual context degrades the model's output quality.

Why am I getting a 400 error when I send a PDF to the Claude API?

The two most common causes are (1) using the wrong block type — PDFs need a document block, not an image block — and (2) a missing beta header when using the Files API. If you are referencing a file_id, you must call client.beta.messages.create() and include betas=["files-api-2025-04-14"]. Also check that the total request body (prompt + encoded PDF) does not exceed 32 MB.

Can I send a password-protected PDF to the API?

No. All three providers will either reject an encrypted PDF or return empty or garbled content. Decrypt the PDF first using a library like pikepdf (Python) or qpdf (command-line), then send the decrypted file.

// In plain English

// Why it matters

// How it works

The three delivery methods compared

Provider limits at a glance

// Code examples: Claude, GPT, and Gemini

Anthropic Claude — inline base64

Anthropic Claude — Files API (upload once, reuse)

OpenAI GPT — Responses API with file_id

Google Gemini — inline base64 and Files API

// Native PDF input vs. text extraction: when to use each

Token cost reality check

// Common pitfalls and how to avoid them

// Going deeper

Citations and page-level grounding

Multi-document comparison

Batch processing for high volume

Hybrid pipeline: OCR first, model second

Structured output extraction from PDFs

// FAQ

// Further reading

// Related

In plain English

Why it matters

How it works

Code examples: Claude, GPT, and Gemini

Native PDF input vs. text extraction: when to use each

Common pitfalls and how to avoid them

Going deeper

FAQ

Further reading

Related