AI/TLDR

What Is smolagents? Hugging Face's Code-First Agents

Learn how Hugging Face's smolagents takes a minimal, code-first approach — the model writes and runs Python to act instead of emitting JSON tool calls — and where that pattern wins.

INTERMEDIATE9 MIN READUPDATED 2026-06-13

In plain English

smolagents is a tiny open-source agent library from Hugging Face. Like any agent framework, it wires a language model into a loop so it can act — call a search API, run a calculation, read a file — and not just chat. What makes smolagents different is how the model acts: instead of emitting a structured JSON request that names a tool and its arguments, the model writes a short snippet of Python code, and your system runs it.

smolagents — illustration
smolagents — pondhouse-data.com

Picture two ways of asking an assistant to get something done. The first is a fill-in-the-blank form: "Tool: web_search. Query: weather in Tokyo." The assistant can only pick a tool from a menu and fill the boxes. The second is handing the assistant a Python console and saying, "Just write the code." It might write results = web_search('weather in Tokyo'); print(results[0]) — looping, combining tools, and reacting to what comes back, all in a few lines. smolagents is built around that second style, which it calls a CodeAgent.

The name is literal: it is a small library ("smol" is the meme spelling of "small"). The whole agent loop is a few hundred lines you can read in one sitting. It is model-agnostic — you can drive it with Claude, an OpenAI model, a local model, or anything on the Hugging Face Hub — and it leans on the Hub for ready-made tools and for sharing agents.

Why it matters

The standard way an agent acts today is function calling (also called tool use): you describe each tool to the model, and it replies with JSON saying which one to invoke and with what arguments. That works well, but it has a ceiling. Each step is one tool call. To chain three tools, loop over a list, or branch on a result, the model must do a separate round-trip for every single action, and the orchestration logic has to live in your glue code.

Writing code collapses many of those steps into one. Research from Hugging Face and others found that letting a model express actions as code — where a single block can call several tools, store intermediate values in variables, and loop — tends to need fewer steps and produces more reliable multi-tool behavior than emitting one JSON call at a time. Code is simply a more natural language for composition than a flat list of tool invocations, and models have seen enormous amounts of Python during training.

Who should care about smolagents specifically?

  • Builders who want to understand agents, not just use them. The codebase is small enough to read end to end, so it is a great way to see the agent loop instead of treating it as a black box.
  • Teams already in the Hugging Face ecosystem. Tools, models, and even whole agents can be pulled from and pushed to the Hub, so sharing is built in.
  • Tasks that are heavy on composition — multi-step data wrangling, calling several APIs and combining results, anything where the action is naturally "a bit of code" rather than "pick one tool."
  • People who find big frameworks too heavy. If LangChain-scale abstractions feel like overkill, a few-hundred-line library is refreshing.

How it works

Under the hood, smolagents runs a classic agent loop — closely related to the ReAct pattern of reason, then act. The twist is the action step: the model's output is a Python snippet, the snippet runs in a sandbox, whatever it prints comes back as an observation, and the loop repeats until the model calls a special final_answer tool.

Each tool you give the agent — a web search, a calculator, an API wrapper — is exposed inside that sandbox as a plain Python function the model can call by name. So the model isn't choosing from a menu; it is writing ordinary code that happens to call your functions. A single generated snippet can call two tools, do arithmetic on the results, and decide what to return, in one shot.

Defining a tool

A tool is just a Python function with a @tool decorator and a clear docstring. The decorator turns the function — and crucially its docstring and type hints — into a description the model reads, so it knows the tool exists and how to call it.

a tool the agent can callpython
from smolagents import tool

@tool
def get_temperature(city: str) -> float:
    """Return the current temperature in Celsius for a city.

    Args:
        city: The city name, e.g. "Tokyo".
    """
    data = call_weather_api(city)   # your real API call
    return data["temp_c"]

Running a CodeAgent

You hand the agent a model and a list of tools, then call run. Behind the scenes it builds the system prompt, runs the loop above, and returns the final answer.

a minimal CodeAgentpython
from smolagents import CodeAgent, LiteLLMModel

# Any provider works; here we drive it with a Claude model via LiteLLM.
model = LiteLLMModel(model_id="claude-sonnet-4-6")

agent = CodeAgent(tools=[get_temperature], model=model)

answer = agent.run(
    "Is it warmer in Tokyo or in Paris right now, and by how much?"
)
print(answer)

Given that prompt, the model might generate a single snippet like t = get_temperature('Tokyo'); p = get_temperature('Paris'); print(t - p). Two tool calls, one subtraction, one step — where a JSON tool-calling agent would have needed at least three separate round-trips. The printed difference becomes the observation, and the model then calls final_answer with a sentence for the user.

Code agent vs JSON tool-calling agent

The CodeAgent isn't the only style smolagents supports — it also ships a ToolCallingAgent that uses the familiar JSON approach. The interesting question is when each one wins. The difference is purely in how the model expresses an action.

SituationBetter fitWhy
Combine 3 API results into one answerCode agentOne snippet does it; no per-call round-trips
Loop over an unknown-length listCode agentA for loop is natural in code, awkward in JSON
Call a single tool and stopEitherBoth handle one discrete action fine
Run in a locked-down, no-exec environmentJSON agentNo arbitrary code means a smaller attack surface
Model is weak at writing codeJSON agentStructured calls are easier for smaller models

Security: running model-written code safely

The headline risk is obvious: you are executing code that a language model wrote. If that code runs with full access to your machine, a bad generation — or a prompt injection hiding in a web page the agent reads — could delete files, exfiltrate secrets, or make unintended network calls. This is the central trade-off of the code-first pattern, and smolagents takes it seriously.

The library's default executor is a restricted Python interpreter: it only allows a small allow-list of imports, blocks dangerous built-ins, and caps how much the snippet can do. That cuts casual mistakes, but a restricted interpreter is not a true security boundary on its own.

For anything facing untrusted input or running in production, you should isolate execution properly. smolagents supports running the generated code inside sandboxed environments — for example a container or a remote sandbox service — so even malicious code stays trapped.

Hub integration and multi-agent setups

Because it comes from Hugging Face, smolagents is wired into the Hub. You can pull a tool that someone else published as a Space and use it like a local function, and you can push your own agent to the Hub so others can load it in one line. Tools and agents become shareable artifacts, the same way models and datasets already are.

It also supports multi-agent structures. A common pattern is a manager agent that hands sub-tasks to specialist agents — for example one agent that only does web research and another that only writes code — by exposing each sub-agent to the manager as if it were a tool.

And like most modern frameworks, smolagents can connect to tools exposed over the Model Context Protocol, so an MCP server's tools become available to the agent without custom wrappers.

Going deeper

Once the basic loop clicks, a few finer points separate a toy demo from something you'd ship.

The code-first pattern needs a capable model. A CodeAgent only shines when the underlying model writes correct Python reliably. With a weaker or smaller model, generated snippets fail to run, the agent burns steps recovering from syntax errors, and the JSON ToolCallingAgent — which constrains the model to a known shape — can be the more dependable choice. Match the agent style to the model's coding ability.

Minimal is a feature and a limit. smolagents deliberately does not try to be a do-everything platform. It has no opinionated memory layer, no built-in RAG stack, no graph-based control flow. For document-heavy retrieval you might reach for LlamaIndex; for role-based crews, CrewAI; for conversational multi-agent debate, AutoGen. smolagents trades breadth for a clear, hackable core.

Observability matters more with code agents. Because each step can do a lot, a single bad snippet can derail a run in surprising ways. Log every generated code block and every observation, and step through traces when a run goes wrong — the visibility into what code ran is one of the underrated upsides of the pattern.

Where to go next. If you're deciding between libraries rather than learning one, the agent framework comparison and how to choose an agent framework lay out the trade-offs side by side. The honest summary for smolagents: reach for it when you value a small, readable core and your tasks are composition-heavy enough that letting the model write code — safely sandboxed — beats handing it a menu of JSON tools.

FAQ

What is smolagents used for?

smolagents is a lightweight Python library from Hugging Face for building AI agents — models that take actions like searching, calculating, or calling APIs in a loop. Its distinctive feature is the CodeAgent, where the model writes Python code to act instead of emitting JSON tool calls. People use it for multi-step, composition-heavy tasks and for learning how agent loops actually work.

What is the difference between a code agent and a tool-calling agent?

A tool-calling agent emits one structured JSON request per step naming a tool and its arguments. A code agent writes a snippet of Python that can call several tools, loop, branch, and store intermediate results in one step. Code is more expressive for composition, so it often needs fewer steps; JSON is simpler and safer because no arbitrary code runs.

Is smolagents safe? It runs model-generated code.

It can be, but only with proper isolation. The default executor is a restricted Python interpreter that blocks most dangerous operations, but that is not a hard security boundary. For untrusted input or production, run the generated code in a real sandbox — a container or a remote execution service — so even malicious code stays contained.

Which models work with smolagents?

It is model-agnostic. You can drive it with Claude, OpenAI models, local open-weight models, or anything on the Hugging Face Hub, through wrappers like LiteLLM or the Hub inference client. The CodeAgent works best with models that write correct Python reliably; weaker models often do better with the JSON ToolCallingAgent.

How is smolagents different from LangChain or CrewAI?

smolagents is deliberately minimal — a few hundred readable lines centered on the code-first agent loop — with no built-in memory, RAG, or graph control flow. Bigger frameworks like CrewAI or LlamaIndex offer far more out of the box at the cost of more abstraction. Choose smolagents when you want a small, hackable core and the code-writing pattern; choose the others when you need their broader features.

Further reading