What Is the Claude API? A Beginner's Guide to Building with Anthropic

Go from zero to your first successful Claude API call, understanding the Messages format, model lineup, and pricing along the way.

BEGINNER12 MIN READUPDATED 2026-06-11

In plain English

The Claude API is how your code talks to Claude, Anthropic's family of AI models. When you chat with Claude in a browser, a person is typing. The API is the same brain, but a program sends the message and a program reads the answer — no human in the loop.

Think of it like a vending machine for intelligence. You put in a request (some text, maybe an image, and a few settings), and out comes a response (Claude's reply). You don't see the machinery inside — Anthropic runs the model on its own servers. You just send a request over the internet and get JSON back.

Concretely: you send an HTTP request to https://api.anthropic.com, carrying your question and a secret key that proves you're allowed to use it. Anthropic runs the model, then sends back the answer. That's the whole loop. If the idea of "calling an AI model over HTTP" is new, the LLM API basics guide walks through the general pattern that every provider — Anthropic, OpenAI, Google — follows.

Why it matters

The web chat is great for one person typing one question. But the moment you want Claude inside something — a customer-support bot, a tool that summarizes 10,000 documents overnight, a feature in your app that drafts emails — you need the API. A human can't sit there pasting text into a browser at that scale. Code can.

The API is what turns "Claude is a clever assistant" into "Claude is a building block I can ship." It's the difference between using a calculator and wiring a calculator chip into your own device.

Who should care

App developers adding an AI feature — a chatbot, a writing helper, smart search.
Data folks running bulk jobs — classify, extract, or summarize thousands of records.
Automation builders wiring Claude into scripts, internal tools, or workflows.
Anyone learning AI engineering — the Messages API is the foundation almost everything else (agents, RAG, tool use) is built on.

What did it replace? Mostly nothing — there was no "send text to a frontier model and get a thoughtful answer" before this generation of APIs existed. What it displaces is a huge amount of hand-written logic: brittle rules engines, keyword matchers, and templated responses that never quite worked. A single API call can now do what used to take a team months.

How it works

Every request goes to a single endpoint: POST /v1/messages. You hand it three essential things and Claude hands back a response. Here's the round trip:

// One Claude API request, start to finish

Your codebuilds a request: model, max_tokens, messagesHTTPS + API keysent to api.anthropic.comAnthropic runs Claudemodel generates a replyJSON responsecontent, usage, stop_reasonYour code reads itpull out the text and use it

The three things every request needs

model — which Claude to use (e.g. claude-opus-4-8). More on the lineup below.
max_tokens — the most tokens Claude is allowed to generate in its reply. A token is a chunk of text, roughly ¾ of a word; see what is a token. This is a hard ceiling, not a target.
messages — the conversation so far, as a list of turns. Each turn has a role ("user" or "assistant") and content (the text).

The Messages format

The messages list is the heart of the API. The simplest version is one user turn:

messages (single turn)json

[
  { "role": "user", "content": "Explain photosynthesis in one sentence." }
]

Claude replies with an assistant turn. To continue a conversation, you append its reply and your next question to the same list and send the whole thing again. This is the part beginners trip on: the API is stateless. It remembers nothing between calls. There's no session on Anthropic's side holding your chat history. If you want Claude to "remember" turn 1 when you send turn 5, you resend turns 1 through 4 every time.

// Stateless: what you send vs. what Claude knows

What you send each call

The full message history
Your system prompt (if any)
Model + settings

What Claude remembers between calls

Nothing.
No saved session.
Every request starts fresh.

There's also an optional system prompt — a separate instruction that sets Claude's role and rules for the whole conversation ("You are a terse SQL expert. Never explain unless asked."). It isn't a message turn; it's a top-level field. Good system prompts are their own craft — see prompt engineering basics.

What comes back

The response is JSON. The three fields you'll touch most: content (a list of blocks — the text reply lives in content[0].text), stop_reason (why Claude stopped — usually "end_turn", or "max_tokens" if it ran out of room), and usage (how many input and output tokens you were charged for).

Your first call, step by step

Let's go from nothing to a working request. Four steps.

Get an API key. Sign up at the Anthropic Console, open Settings, and create a key. It looks like sk-ant-.... Treat it like a password — anyone who has it can spend your money.
Store the key in an environment variable, never in your code. The SDK reads ANTHROPIC_API_KEY automatically. On macOS/Linux: export ANTHROPIC_API_KEY=sk-ant-...
Install the SDK. Anthropic ships official libraries. For Python: pip install anthropic.
Send the request and read the reply.

setupbash

pip install anthropic
export ANTHROPIC_API_KEY=sk-ant-...   # your real key

first_call.pypython

from anthropic import Anthropic

# Reads ANTHROPIC_API_KEY from the environment automatically.
client = Anthropic()

response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain photosynthesis in one sentence."}
    ],
)

# content is a list of blocks; the text reply is in the first block.
print(response.content[0].text)
print("Tokens used:", response.usage.input_tokens, "in /", response.usage.output_tokens, "out")

Run it and you'll see a one-sentence answer plus a token count. That's a complete Claude integration — everything else is variations on this.

Adding a system prompt and a second turn

multi_turn.pypython

client = Anthropic()

messages = [
    {"role": "user", "content": "My favorite number is 7."},
]

first = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    system="You are a friendly math tutor. Keep answers short.",
    messages=messages,
)

# Append Claude's reply, then ask a follow-up. Resend the FULL history.
messages.append({"role": "assistant", "content": first.content[0].text})
messages.append({"role": "user", "content": "What is my favorite number times 6?"})

second = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    system="You are a friendly math tutor. Keep answers short.",
    messages=messages,
)
print(second.content[0].text)  # Knows it's 42, because you resent turn 1.

The model lineup and how billing works

"Claude" isn't one model — it's a family, and you pick one per request via the model field. They trade intelligence against speed and cost. As a rule of thumb: bigger model, smarter and pricier; smaller model, faster and cheaper.

Tier	Good for	Trade-off
Opus	Hard reasoning, long agentic tasks, top quality	Most capable, highest cost
Sonnet	Balanced everyday work — chat, coding, extraction	Strong all-rounder, mid cost
Haiku	High-volume, simple, latency-sensitive tasks	Fastest and cheapest, less depth

A common beginner mistake is reaching for the biggest model for everything. Don't. Match the model to the job: use a fast tier for classification or simple formatting, and save the heavyweight tier for genuinely hard reasoning. You can always start small and move up if quality isn't there.

How you're billed

Pricing is per token, and split into two rates: a lower price for input tokens (everything you send — your messages, system prompt, and history) and a higher price for output tokens (what Claude generates). Output is typically several times more expensive than input.

// What you pay for in one request

Total cost

Input tokensmessages + system + history — cheaper rate

Output tokensClaude's reply — pricier rate

The catch with stateless conversations: because you resend the whole history every turn, your input tokens grow with every message. A 20-turn chat charges you for turn 1's text twenty times over. This is why long conversations get expensive — and why features like prompt caching exist to discount repeated context. The full breakdown of input, output, and cached pricing lives in how LLM API pricing works.

Common beginner mistakes

These are the snags that cost new builders the most time. Knowing them up front saves hours.

Expecting the API to remember

The single biggest one. The API is stateless — if Claude "forgot" what you said earlier, it's because you didn't resend it. Keep your own message list and pass the full history each call.

Setting max_tokens too low

max_tokens caps the reply length. Set it too low and Claude gets cut off mid-sentence — you'll see stop_reason: "max_tokens". Always check the stop reason; if it's "max_tokens", raise the cap and try again. It does not make Claude aim for that length; it's purely a ceiling.

Leaking your API key

Never hardcode sk-ant-... into source, and never commit it to GitHub. Bots scan public repos for keys within minutes. Use an environment variable. If a key leaks, revoke it in the Console immediately.

Not handling errors and rate limits

Real apps hit bumps: a 429 when you send too fast, a 529 when Anthropic is overloaded, a 400 for a malformed request. The good news — the SDK retries transient errors (429, 5xx) automatically with backoff. Wrap calls in try/except for the rest, and branch on the error type rather than parsing error strings.

error_handling.pypython

import anthropic
from anthropic import Anthropic

client = Anthropic()

try:
    response = client.messages.create(
        model="claude-opus-4-8",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello!"}],
    )
    if response.stop_reason == "max_tokens":
        print("Reply was truncated — raise max_tokens and retry.")
    print(response.content[0].text)
except anthropic.RateLimitError:
    print("Slow down — too many requests. The SDK already retried; back off.")
except anthropic.APIError as e:
    print(f"API error {e.status_code}: {e.message}")

Code	Means	What to do
401	Bad or missing API key	Check ANTHROPIC_API_KEY is set correctly
404	Unknown model ID	Fix the model string (e.g. claude-opus-4-8)
429	Rate limited	Slow down; SDK retries with backoff
529	Anthropic overloaded	Retry later, or fall back to another model

Going deeper

Once your first call works, the same POST /v1/messages endpoint unlocks everything serious AI apps are built on. None of these are separate APIs — they're extra fields on the request you already know.

Streaming

By default you wait for the whole reply, then get it all at once. For a long answer that can mean staring at a blank screen for many seconds. Streaming sends the text token-by-token as it's generated — the typewriter effect you see in chat UIs. It's also the safe choice for any request with a large max_tokens, because it avoids hitting HTTP timeouts on slow, lengthy generations. Set stream=True (or use the SDK's stream helper). The mechanics — server-sent events, partial chunks — are covered in streaming explained.

Tool use (function calling)

Claude can't check today's weather or query your database on its own. Tool use lets you describe functions Claude is allowed to call; when it decides one is needed, it returns a structured request ("call get_weather with city='Paris'"), your code runs it, and you feed the result back. This is the engine behind agents and live-data apps. Start with what is function calling.

Structured outputs and extended thinking

Structured outputs force the reply to match a JSON schema you define — invaluable when another program has to parse Claude's answer reliably.
Extended (adaptive) thinking lets capable models reason internally before answering, trading some latency and tokens for better results on hard problems. It pairs with an effort setting to dial the depth up or down.
Prompt caching stores a large, unchanging prefix (a long system prompt, a reference document) so repeat requests reuse it at a fraction of the cost — the main lever against the growing-history bill mentioned earlier.

Batch, files, and beyond

For non-urgent bulk jobs, the Message Batches API processes huge volumes asynchronously at a discount. The Files API lets you upload a document once and reference it across many calls instead of re-sending it. And when you outgrow single calls entirely, you move toward agents and retrieval systems — but every one of those still bottoms out in a Messages API request, so the fundamentals you just learned never stop being relevant.

FAQ

What is the Claude API in simple terms?

It's a way for your code (not a person in a browser) to send text to Anthropic's Claude models and get an AI-generated reply back. You make an HTTPS request to https://api.anthropic.com with your question and a secret API key, Anthropic runs the model, and returns the answer as JSON. It's how you put Claude inside an app, script, or automated workflow.

How do I get a Claude API key and make my first call?

Sign up at the Anthropic Console, go to Settings and create an API key (it starts with sk-ant-). Store it in the ANTHROPIC_API_KEY environment variable, install the SDK (pip install anthropic for Python), then call client.messages.create(...) with a model, a max_tokens value, and a messages list. The reply text is in response.content[0].text.

Does the Claude API remember previous messages?

No. The API is stateless — it keeps no memory between calls. If you want Claude to recall earlier turns in a conversation, you have to resend the entire message history with every request. There's no session stored on Anthropic's side; you manage the conversation list yourself in your own code.

How much does the Claude API cost?

You pay per token, with separate rates for input tokens (what you send) and output tokens (what Claude generates), and output costs more than input. Cost also depends on which model tier you pick — Opus is the priciest, Haiku the cheapest. Because conversations resend their full history each turn, input tokens accumulate over a long chat, which is why prompt caching exists to discount repeated context.

Which Claude model should I use as a beginner?

Match the model to the task instead of always grabbing the biggest one. Use a fast, cheap tier (Haiku) for simple, high-volume jobs like classification; a balanced tier (Sonnet) for most everyday chat and coding; and the top tier (Opus) only for genuinely hard reasoning or long agentic work. Start small and move up if the quality isn't good enough.

What's the difference between the Claude API and the Claude chat app?

Same underlying models, different access. The chat app is a website where a human types questions. The API is a programmatic interface where your software sends and receives messages over HTTP — letting you build Claude into products, run bulk jobs, and automate tasks at a scale no person could do by hand.

// In plain English

// Why it matters

Who should care

// How it works

The three things every request needs

The Messages format

What comes back

// Your first call, step by step

Adding a system prompt and a second turn

// The model lineup and how billing works

How you're billed

// Common beginner mistakes

Expecting the API to remember

Setting max_tokens too low

Leaking your API key

Not handling errors and rate limits

// Going deeper

Streaming

Tool use (function calling)

Structured outputs and extended thinking

Batch, files, and beyond

// FAQ

// Further reading

// Related

In plain English

Why it matters

How it works

Your first call, step by step

The model lineup and how billing works

Common beginner mistakes

Going deeper

FAQ

Further reading

Related