In plain English
The Claude Agent SDK is a software library from Anthropic that lets you build an autonomous AI agent in a few lines of code. It's published as claude-agent-sdk on pip (Python) and @anthropic-ai/claude-agent-sdk on npm (TypeScript). The headline promise: it gives you the same tools, agent loop, and context management that power Claude Code, but as a library you call from your own program instead of a terminal app.
Here's the everyday analogy. The raw Claude API is like being handed an engine. It's powerful, but to drive anywhere you still have to build the car around it — the wheels, the steering, the fuel line. The Agent SDK hands you the whole car. You sit down, say where you want to go, and it already knows how to read files, run shell commands, search the web, and loop until the job is done. You didn't have to wire any of that up.
Concretely, an agent here means a loop: Claude looks at your request, decides to use a tool (say, read a file), sees the result, decides on the next tool, and keeps going until the task is finished. The SDK runs that loop for you. You write query(prompt="Find and fix the bug in auth.py") and Claude reads the file, spots the bug, and edits it — no tool-handling code on your side.
Why it matters
The hard part of building an agent was never calling the model. It was everything around the model: defining tools, executing them safely, feeding results back, managing a context window that fills up, and stopping when done. Most people who tried to build this from scratch on the raw API rebuilt the same plumbing — badly — over and over.
Anthropic had already solved all of that for Claude Code, their coding agent. The Agent SDK takes that battle-tested harness and exposes it as a library. So instead of writing a tool-execution loop yourself (a topic we cover in function calling and tool use), you get a production-grade one for free, plus built-in file, shell, and web tools that actually run.
Who should care
- Anyone automating a workflow Claude Code already does well — code review in CI, refactors, doc generation, repo triage — but who needs it to run unattended in a pipeline, not in an interactive terminal.
- App builders who want an agent feature (a research assistant, an SRE bot, a data-cleaning agent) without re-implementing the agent loop, permission system, and context compaction.
- Teams standardizing on Claude who want one harness across their CLI use and their production automation, so workflows translate directly between the two.
- Beginners who want to ship a working agent today and learn the internals later, rather than the other way around.
What it replaced, for many people, was a tangle of glue code: a hand-rolled while loop, a switch statement of tool handlers, ad-hoc retry logic, and a permission scheme invented on the spot. The SDK is the case against writing all that yourself — closely related to the question of whether you even need an agent framework at all.
How it works
At the center is one function: query(). You give it a prompt and some options; it returns a stream of messages as the agent works. Under the hood, query() runs the agent loop — the same loop every agent uses, just managed for you.
The loop ends when Claude has nothing left to do (or hits a limit like max_turns). Everything in between — choosing tools, executing them, handling errors, and keeping the conversation coherent — is the SDK's job. Four pieces do the heavy lifting:
Built-in tools
The SDK ships real, executable tools so your agent can act immediately. The main ones: Read, Write, and Edit for files; Bash for terminal commands and git; Glob and Grep for finding and searching files; and WebSearch and WebFetch for the live internet. You don't implement these — they run inside the SDK's harness.
Permissions
Because the agent can run commands and edit files, the SDK gives you control over what it's allowed to do. allowed_tools pre-approves a safe set (e.g. Read, Glob, Grep for a read-only analyst). A permission_mode like acceptEdits auto-approves file edits so the agent doesn't pause for confirmation. This is the safety boundary between a helpful agent and one that runs rm -rf unsupervised.
Context management
Long tasks generate a lot of conversation — file contents, command output, intermediate reasoning — which fills the context window. The SDK manages this automatically, the same way Claude Code does, so the agent can work on long jobs without you manually trimming history. This is applied context engineering, handled for you.
Extensibility: MCP, subagents, hooks
Beyond the built-ins, you can connect external systems through the Model Context Protocol (MCP) — databases, browsers, internal APIs. You can define subagents: specialized helpers your main agent delegates focused subtasks to. And hooks let you run your own code at lifecycle points (PreToolUse, PostToolUse, Stop, and more) to log, validate, or block actions.
A hands-on example
Here's a complete, runnable Python agent that finds every TODO comment in a codebase and writes a summary. Note how little code there is — no tool loop, no handlers, no parsing.
pip install claude-agent-sdk # Python 3.10+
export ANTHROPIC_API_KEY=sk-...import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions
async def main():
async for message in query(
prompt="Find all TODO comments in this repo and write a summary to TODOS.md",
options=ClaudeAgentOptions(
# Pre-approve a safe toolset. Edit/Write are needed to create the file.
allowed_tools=["Read", "Glob", "Grep", "Write"],
permission_mode="acceptEdits", # don't pause to confirm each edit
max_turns=15, # stop runaway loops
),
):
# `query()` streams messages as the agent works. The final
# message carries the result text.
if hasattr(message, "result"):
print(message.result)
asyncio.run(main())Run python todo_agent.py and the agent globs for source files, greps them for TODO, reads the relevant lines, writes TODOS.md, and prints a summary. The same program in TypeScript is nearly identical — the API mirrors across both SDKs:
import { query } from "@anthropic-ai/claude-agent-sdk";
for await (const message of query({
prompt: "Find all TODO comments and write a summary to TODOS.md",
options: {
allowedTools: ["Read", "Glob", "Grep", "Write"],
permissionMode: "acceptEdits",
maxTurns: 15,
},
})) {
if ("result" in message) console.log(message.result);
}Agent SDK vs the raw Claude API
This is the question most beginners have, and the answer is about who runs the tool loop. With the Anthropic client SDK (the raw Claude API), you send a prompt, Claude may ask to call a tool, and you write the code to execute that tool and send the result back — over and over, until it stops. With the Agent SDK, Claude handles that loop and the tools come built in.
- You implement the tool-execution loop
- You define and run every tool yourself
- Maximum control, more boilerplate
- Best for custom, tightly-scoped logic
- Lives in `anthropic` package
- SDK runs the agent loop for you
- File / shell / web tools built in
- Permissions + context handled
- Best for autonomous, multi-step tasks
- Lives in `claude-agent-sdk` package
| Concern | Raw Claude API | Claude Agent SDK |
|---|---|---|
| Tool loop | You write it | Built in |
| File / shell tools | You implement | Included (Read, Bash, ...) |
| Context management | Manual | Automatic |
| Permissions | Roll your own | allowed_tools, permission_mode |
| Best when | You need full control of every step | You want an agent that just works |
A useful rule of thumb: if your task is a single request-response (summarize this, classify that), use the raw API — the agent loop is overkill. If your task is multi-step and open-ended ("investigate this repo and fix the failing test"), the Agent SDK saves you from rebuilding the exact harness Anthropic already ships. There's also a hosted option, Managed Agents, where Anthropic runs the loop and a sandbox for you over a REST API — handy for production without operating your own infrastructure. The SDK, by contrast, runs in your process, working directly on your files.
Common pitfalls
- Giving it
Bashtoo early. TheBashtool runs real shell commands. Don't hand it to an untested agent on a real machine. Sandbox it, or start with read-only tools. - Forgetting
max_turns. Without a turn limit, a confused agent can loop and burn through tokens. Set a ceiling while you iterate. - No permission boundary. Running with everything allowed plus
acceptEditsandBashmeans the agent can modify or delete anything in its working directory. Scopeallowed_toolsdeliberately. - Trusting web content blindly. When the agent uses
WebFetchor reads files from outside sources, that text can contain instructions. This is prompt injection — treat fetched content as data, not commands, and keep dangerous tools off when fetching from the open web. - Expecting it to be deterministic. It's still an LLM agent. Two runs on the same task can take different paths. Build evals and guardrails before you trust it in production.
Going deeper
Once the basics click, the SDK's deeper features are where production agents are actually built. A few worth understanding:
Sessions, resume, and forking
Each run produces a session with an ID. You can capture it, then resume later to continue with full context — files already read, analysis already done. You can also fork a session to explore two approaches from the same starting point. Session state lives as JSONL on your filesystem, which makes it easy to inspect and replay. This is durable agent memory without a database.
Subagents and orchestration
Define an AgentDefinition with its own instructions and a restricted toolset, then let your main agent delegate to it via the Agent tool. A code-reviewer subagent might only get Read, Glob, and Grep. This is how you build a multi-agent system — a coordinator that fans focused subtasks out to specialists, each with a clean context window.
Hooks for control and observability
Hooks (PreToolUse, PostToolUse, Stop, SessionStart, UserPromptSubmit, and more) are callback functions that fire at lifecycle points. Use PreToolUse to block a tool call that touches a forbidden path; use PostToolUse to write an audit log of every file the agent changed. Hooks are the backbone of LLM observability and policy enforcement in agent systems.
Filesystem config and MCP at scale
The SDK reads Claude Code's .claude/ configuration — skills, slash commands, and a CLAUDE.md memory file — so an agent shares the same project context your team uses interactively. Combined with MCP servers for your internal tools, this is what turns a toy script into a real production agent. The open problems are the usual frontier ones: cost control on long autonomous runs, reliable evaluation of non-deterministic behaviour, and keeping a capable agent inside a safe permission boundary.
FAQ
What is the Claude Agent SDK?
It's Anthropic's library for building AI agents that autonomously read files, run commands, search the web, and edit code. It gives you the same agent loop, built-in tools, and context management that power Claude Code, available in Python (claude-agent-sdk) and TypeScript (@anthropic-ai/claude-agent-sdk).
What's the difference between the Claude Agent SDK and the Claude API?
With the raw Claude API you implement the tool-execution loop yourself — you run each tool and feed results back. The Agent SDK runs that loop for you and ships executable file, shell, and web tools built in. Use the raw API for single request-response tasks; use the Agent SDK for multi-step, autonomous work.
Is the Claude Agent SDK the same as the Claude Code SDK?
Yes. It was renamed from the Claude Code SDK to the Claude Agent SDK to signal it's for building any agent, not only coding ones. Older tutorials that mention the Claude Code SDK refer to the same library.
Do I need to install Claude Code to use the Agent SDK?
Not for TypeScript — the npm package bundles a native Claude Code binary for your platform as an optional dependency. You install @anthropic-ai/claude-agent-sdk (Node.js 18+) or claude-agent-sdk on pip (Python 3.10+), set an ANTHROPIC_API_KEY, and you're ready to run an agent.
How do I stop the agent from doing something dangerous?
Control its tools. Use allowed_tools to pre-approve only a safe set (e.g. Read, Glob, Grep for a read-only agent), pick a conservative permission_mode, and set max_turns to cap the loop. For high-risk tools like Bash, run the agent in a container or a throwaway directory.
Can the Claude Agent SDK connect to my own tools and databases?
Yes, through the Model Context Protocol (MCP). You pass MCP server configs in your options, and the agent gains those tools alongside the built-ins — databases, browsers, internal APIs, and hundreds of community servers. You can also write custom in-process tools.