In plain English
A year ago, sending a PDF to a language model meant first converting it to text yourself — stripping out the formatting, losing the tables, and hoping the model could make sense of the garbled output. Today, the three biggest API providers — Anthropic Claude, OpenAI GPT-4o, and Google Gemini — all let you attach a PDF directly to a chat message, the same way you'd attach a file to an email. The model reads both the text and the visual layout, then replies in plain language.

Think of it like handing a physical report to a very fast reader. When you give them the real document — not a blurry photocopy of just the words — they can see the tables, the charts, the column headers, and the footnotes. The model does the same thing: it processes each page both as extracted text and as an image, so nothing meaningful is lost in translation.
There are three distinct ways to get a PDF into a model call: inline base64 (encode the bytes and paste them into the request JSON), Files API (upload once, reference by ID in many requests), and text extraction (parse the PDF yourself, send the resulting text). Each approach has a different cost profile, latency characteristic, and quality ceiling — and the right choice depends on your document type and usage pattern.
Why it matters
PDFs are the dominant format for contracts, invoices, research papers, financial filings, technical manuals, and regulatory documents. Before native PDF support, every document-processing pipeline needed a separate extraction layer — pdfplumber, PyMuPDF, Apache PDFBox, or a third-party OCR service — that added latency, cost, and failure modes of its own. Today that layer is optional, and removing it often improves quality.
- Contract review: attach a 60-page agreement and ask the model to flag non-standard clauses — no pre-processing required.
- Invoice extraction: upload a scanned PDF invoice and receive structured JSON with line items, amounts, and dates.
- Research Q&A: send a technical paper and ask follow-up questions page by page, with the model citing page numbers.
- Compliance checking: compare a multi-section regulation PDF against a product description and identify gaps.
- Report summarisation: feed a quarterly earnings PDF to a model and get a bullet-point summary in seconds.
Getting the input format right is the critical first step. An incorrectly encoded payload returns a cryptic 400 error; a missing beta header on the Claude Files API silently fails; an oversized inline payload trips a request-body limit that is separate from the per-file limit. This guide prevents all three.
How it works
When a provider receives a PDF, it does two things simultaneously: it extracts the raw text layer (if the PDF has one) and it renders each page as an image. Both representations are tokenised and fed into the model together, which is why native PDF input handles scanned documents and PDFs with embedded charts far better than text-only extraction.
The three delivery methods compared
| Method | How it works | Best for | Watch out for |
|---|---|---|---|
| Inline base64 | PDF bytes encoded as a base64 string, sent inside the JSON request body | One-off calls, local files, no hosting needed | Inflates payload ~33%; 32 MB cap on Claude and OpenAI |
| Files API (upload once) | Upload the PDF to the provider's storage, receive a file_id, reference it in any future request | Reusing the same document across many calls; multi-turn chats | Extra upload step; beta header required for Claude; files expire |
| URL input (OpenAI / Gemini) | Pass a publicly accessible HTTPS URL; provider fetches the PDF at request time | PDFs already hosted on a CDN or public storage bucket | URL must be reachable from provider servers; secrets in URLs are exposed |
| Text extraction (DIY) | Parse the PDF yourself with a library, send the resulting text as a normal string | Text-heavy docs where layout doesn't matter; lowest cost | Loses charts, tables, and scanned pages; adds pipeline complexity |
Provider limits at a glance
| Provider | Max pages | Max file size | Delivery options |
|---|---|---|---|
| Anthropic Claude | 100 pages | 32 MB per request | Inline base64, Files API (file_id) |
| OpenAI GPT-4o / o1 | 100 pages | 32 MB total across files | Inline base64, Files API (file_id), HTTPS URL |
| Google Gemini | 1,000 pages | 50 MB inline; 2 GB via Files API | Inline base64, Files API, GCS URL, HTTPS URL |
Code examples: Claude, GPT-4o, and Gemini
Anthropic Claude — inline base64
Claude's Messages API accepts a document content block. Set source.type to "base64", source.media_type to "application/pdf", and supply the base64-encoded bytes in source.data. No special beta header is needed for inline PDF — that is generally available.
import anthropic
import base64
with open("contract.pdf", "rb") as f:
pdf_data = base64.standard_b64encode(f.read()).decode("utf-8")
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from env
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=2048,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data,
},
},
{
"type": "text",
"text": "Summarise the key obligations in this contract.",
},
],
}
],
)
print(response.content[0].text)Anthropic Claude — Files API (upload once, reuse)
For PDFs you query repeatedly — a product manual, a legal handbook, a reference specification — upload once with client.beta.files.upload() and reuse the file_id across as many calls as you like. You must include the anthropic-beta: files-api-2025-04-14 header, which the SDK passes automatically when you use the beta.files namespace.
import anthropic
client = anthropic.Anthropic()
# Upload once
with open("technical-manual.pdf", "rb") as f:
uploaded = client.beta.files.upload(
file=("technical-manual.pdf", f, "application/pdf"),
)
file_id = uploaded.id # e.g. "file_abc123"
# Reuse in every subsequent call
response = client.beta.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "file",
"file_id": file_id,
},
},
{
"type": "text",
"text": "What does section 4 say about warranty exclusions?",
},
],
}
],
betas=["files-api-2025-04-14"],
)
print(response.content[0].text)OpenAI GPT-4o — Responses API with file_id
OpenAI supports PDF inputs in both the Chat Completions API and the newer Responses API. Upload with client.files.create(purpose="user_data"), then reference the returned file_id in the request. You can also send a base64 data URI or a public HTTPS URL directly.
import openai
client = openai.OpenAI() # reads OPENAI_API_KEY from env
# Upload once
with open("earnings-report.pdf", "rb") as f:
uploaded = client.files.create(file=f, purpose="user_data")
file_id = uploaded.id # e.g. "file-abc123"
# Reference in a Responses API call
response = client.responses.create(
model="gpt-4o",
input=[
{
"role": "user",
"content": [
{
"type": "input_file",
"file_id": file_id,
},
{
"type": "input_text",
"text": "What was the net revenue for Q3?",
},
],
}
],
)
print(response.output_text)Google Gemini — inline base64 and Files API
Gemini accepts PDFs either inline (base64 with mime_type: "application/pdf") or via the Files API for larger documents. The inline limit is 50 MB; for larger PDFs use client.files.upload(). Gemini also accepts direct HTTPS URLs and Google Cloud Storage (gs://) URIs.
import google.generativeai as genai
import base64
genai.configure(api_key="YOUR_GEMINI_API_KEY")
# --- Option A: inline base64 (up to 50 MB) ---
with open("research-paper.pdf", "rb") as f:
pdf_bytes = f.read()
model = genai.GenerativeModel("gemini-1.5-pro")
response = model.generate_content([
{
"inline_data": {
"mime_type": "application/pdf",
"data": base64.b64encode(pdf_bytes).decode(),
}
},
"List the three main findings of this paper.",
])
print(response.text)
# --- Option B: Files API (for larger or reused PDFs) ---
uploaded = genai.upload_file("large-report.pdf", mime_type="application/pdf")
response2 = model.generate_content([
uploaded,
"Summarise the executive summary section.",
])
print(response2.text)Native PDF input vs. text extraction: when to use each
Native PDF input is not always the right call. Sending a PDF activates the model's visual pipeline, which reads every page as an image. That is powerful for complex layouts, but it costs more tokens than sending clean text, and it runs slower. For high-volume pipelines or simple text-heavy documents, pre-extracting the text with a library like pdfplumber, PyMuPDF (fitz), or Microsoft's open-source markitdown can cut costs by 50% or more while delivering equally good results.
| Scenario | Recommended approach | Reason |
|---|---|---|
| Scanned PDF (image-only, no text layer) | Native PDF input | No text to extract; model must use vision |
| PDF with complex tables, charts, or mixed layout | Native PDF input | Visual pipeline preserves layout context that text strips away |
| Text-heavy PDF with a clean text layer (contract, paper) | Text extraction (pdfplumber / PyMuPDF) | 50%+ token savings; extraction quality is equivalent |
| Same PDF queried many times | Files API upload (any provider) | Upload cost paid once; no repeated encoding overhead |
| Large PDF > 32 MB / > 100 pages (Claude / OpenAI) | Gemini Files API or chunked text extraction | Claude and OpenAI hard limits; Gemini supports up to 1,000 pages |
| RAG pipeline with many small chunks | Text extraction + embedding | Chunk boundaries controlled by you; cheaper than per-chunk PDF calls |
Token cost reality check
Native PDF input tokenises each page twice — once as text tokens, once as visual patch tokens. A typical dense A4 page runs to roughly 1,500–2,500 tokens in combined form. A 50-page PDF can therefore consume 75,000–125,000 input tokens before you've written a single word of your prompt. At Claude Sonnet pricing of $3 per million input tokens, that is roughly $0.23–$0.38 per document query. Extracting the same text with PyMuPDF and sending it as a plain string typically halves that token count.
Common pitfalls and how to avoid them
- Missing beta header on Claude Files API. Referencing a
file_idwithout thefiles-api-2025-04-14beta header causes a 400. Always useclient.beta.messages.create()and passbetas=["files-api-2025-04-14"]. - Confusing per-file size with request-body size. Claude and OpenAI both have a 32 MB limit — but that is the request body limit, not a per-file limit. A 24 MB PDF becomes approximately 32 MB after base64 encoding, leaving almost no room for the prompt. Use the Files API for any PDF over ~20 MB.
- Password-protected PDFs. Encrypted PDFs that require a password to open will fail silently or return garbage. Decrypt with
pikepdforqpdfbefore sending. - Sending PDF via
imageblock instead ofdocumentblock (Claude). Claude requires adocumentblock withmedia_type: "application/pdf". Using animageblock with PDF data is rejected. - Private or expiring URLs. When using URL-based input on OpenAI or Gemini, the provider's servers must be able to fetch the URL at request time. A signed S3 URL that expires in 60 seconds, or a CDN behind IP allowlisting, will return a 403.
- Assuming page order is preserved in extraction. When you extract text yourself with some libraries, multi-column layouts scramble reading order. If coherent reading order matters, test your extraction output before assuming it is correct.
Going deeper
Once the basic pipeline is working, several advanced patterns become worth exploring.
Citations and page-level grounding
Anthropic's Citations API (available in Amazon Bedrock and the direct API) lets the model return structured citations that point to the exact passage in the source document it drew from. This is especially valuable for legal, regulatory, and medical use cases where you need an audit trail. Pass citations: {enabled: true} in the request, and each sentence in the response can carry a document_index and start_page_number reference.
Multi-document comparison
All three providers support multiple document blocks in a single message. You can compare two contracts, diff two versions of a specification, or cross-reference a regulation against a product policy in one round-trip. Add each PDF as a separate content block and label them in your prompt (Document A, Document B) so the model can reference them unambiguously. Watch the token budget — two 50-page PDFs at 2,000 tokens per page will cost 200,000 input tokens per query.
Batch processing for high volume
When processing hundreds of PDFs, use the provider's Batch API rather than synchronous calls. Both Anthropic and OpenAI offer batch endpoints that accept a JSONL file of requests, run them asynchronously off-peak, and are billed at roughly half the standard rate. Upload your PDFs to the Files API first (paying the upload cost once), then reference the same file_id across thousands of batch items — you are not re-sending the bytes for each job.
Hybrid pipeline: OCR first, model second
For very large document archives where both cost and latency matter, consider a staged approach: run a lightweight OCR tool (Tesseract, AWS Textract, Google Document AI) to extract text and table structure from PDFs, then send the clean Markdown output to the LLM. The LLM call becomes a straightforward text-in, text-out task — fast and cheap. Reserve native PDF input for documents where the OCR quality is inadequate or where the visual layout is itself meaningful (engineering drawings, complex financial statements).
Structured output extraction from PDFs
Combine native PDF input with a JSON schema prompt to turn any document into a typed data structure. Describe the fields you want — {"invoice_number": "string", "line_items": [{"description": "string", "amount": "number"}], "total": "number"} — instruct the model to return only valid JSON, then pipe the output to json.loads(). For OpenAI, response_format: {type: "json_schema", json_schema: {...}} enforces the schema at the API level, eliminating parse failures entirely.
FAQ
Does Claude's API support sending PDFs directly, or do I have to convert them first?
Claude supports PDFs natively. You can send a PDF as an inline base64 document block without any pre-conversion. For PDFs you query repeatedly, Claude's Files API lets you upload once and reference the file by ID in every subsequent call, which is more efficient.
What is the difference between the Files API and sending a base64-encoded PDF?
With base64 inline, the PDF bytes travel with every API request — convenient for one-off calls but expensive for repeated queries because you pay the encoding overhead each time. With the Files API, you upload the PDF once, get a persistent file_id, and reference that ID in as many requests as you like. The Files API also lets you stay well below request-body size limits for large documents.
How many pages can I send to the API at once?
Claude and OpenAI both cap PDF inputs at 100 pages per request. Google Gemini supports up to 1,000 pages inline or via the Files API. If your document exceeds these limits, split it into chunks by page range before sending, or pre-extract the text.
When should I extract PDF text myself instead of sending the raw PDF?
Text extraction makes sense for documents with a clean embedded text layer — contracts, research papers, plain reports — where the visual layout is not meaningful. It can cut token costs by 50% or more. Stick with native PDF input for scanned documents, PDFs with charts or complex tables, or any case where losing the visual context degrades the model's output quality.
Why am I getting a 400 error when I send a PDF to the Claude API?
The two most common causes are (1) using the wrong block type — PDFs need a document block, not an image block — and (2) a missing beta header when using the Files API. If you are referencing a file_id, you must call client.beta.messages.create() and include betas=["files-api-2025-04-14"]. Also check that the total request body (prompt + encoded PDF) does not exceed 32 MB.
Can I send a password-protected PDF to the API?
No. All three providers will either reject an encrypted PDF or return empty or garbled content. Decrypt the PDF first using a library like pikepdf (Python) or qpdf (command-line), then send the decrypted file.