In plain English
Every agent framework is a different way of thinking about agents before it is a library. LangGraph asks you to draw a flowchart. The bare agent loop asks you to write a while loop. CrewAI asks you to cast a TV production crew. These three pictures — the graph, the loop, and the crew — are not just API styles; they are mental models that determine how you structure problems, what you name things, and where bugs tend to hide.
Think of it like navigation. One person thinks about a road trip as a turn-by-turn graph (nodes are intersections, edges are roads). Another person thinks of it as a recursive loop — while destination not reached: evaluate position, take next best action. A third person thinks of it as a team assignment — the driver drives, the navigator reads the map, the co-pilot looks for hazards. All three reach the same destination; they just frame the problem differently. When you learn a new agent framework, the first question is: which of these three pictures is it using?
Why mental models matter more than APIs
The framework's mental model determines how you decompose problems. If you reach for LangGraph and your problem is naturally a linear sequence of role handoffs, you spend days wrestling with graph edges and node state to express something CrewAI would represent in 15 lines. If you reach for a bare agent loop for a workflow that has 12 conditional branches and requires resuming after a crash, you end up hand-rolling exactly what LangGraph gives you for free — but worse.
Mental models also surface different kinds of bugs. In the loop model, the classic bug is an infinite loop — the agent keeps calling tools without converging. In the graph model, the classic bug is an unreachable node — a state the agent enters from which there is no edge to END. In the crew model, the classic bug is task contamination — a downstream agent receives context from a previous agent that silently biases its output. Knowing the model tells you where to look when things go wrong.
Finally, mental models determine how you communicate with your team. Saying "we need to add a conditional edge from the planner to the retriever" means something precise in LangGraph. "We need a new task between researcher and writer" means something precise in CrewAI. Speaking the framework's native vocabulary reduces ambiguity in code reviews and architecture discussions.
The three mental models
Three mental models dominate the agent framework landscape. Understanding each one on its own terms — not just as a comparison — is the fastest path to fluency in any specific framework.
- Core unit: a while-loop iteration
- State: a growing message history
- Control: LLM decides next action
- Vocab: think, act, observe, repeat
- Exemplars: ReAct pattern, OpenAI Agents SDK
- Bug pattern: infinite loop
- Core unit: node + edge
- State: a typed shared state object
- Control: explicit conditional edges
- Vocab: nodes, edges, StateGraph, checkpoints
- Exemplars: LangGraph
- Bug pattern: unreachable node
- Core unit: role-based agent + task
- State: task outputs passed between agents
- Control: sequential or hierarchical delegation
- Vocab: agent, role, task, crew, process
- Exemplars: CrewAI
- Bug pattern: task context contamination
Mental model 1: The loop
The loop model is the closest to how LLM agents actually work under the hood. Every agent is a while loop: send the current message history to the LLM, get a response, check whether that response is a final answer or a tool call, execute the tool if needed, append the result to history, and repeat. The loop terminates when the LLM produces a final text answer or a maximum-step limit is hit.
The formal version of this model is the ReAct pattern (Reasoning + Acting), introduced in a 2022 paper. The LLM alternates between generating a thought (reasoning about what to do next), an action (calling a tool), and an observation (receiving the tool's result). This cycle continues until the LLM is confident enough to emit a final answer.
messages = [{"role": "user", "content": user_input}]
while True:
response = llm.call(messages, tools=available_tools)
if response.is_final_answer:
return response.text
# LLM decided to call a tool
tool_result = execute_tool(response.tool_call)
messages.append({"role": "tool", "content": tool_result})
# loop: send updated history back to LLMFrameworks built on the loop model — including the OpenAI Agents SDK and raw API-based approaches — wrap this core with robustness features: max-iteration guards, tool error handling, streaming responses, and conversation history management. The mental model stays a loop; the framework just makes the loop production-grade.
Mental model 2: The graph
The graph model, exemplified by LangGraph, reimagines agent workflows as directed graphs — the same data structure used in compilers, state machines, and workflow engines. Each node is a Python function that reads from a shared state object, does computation (calls an LLM, invokes a tool, transforms data), and writes back to state. Each edge defines what happens next: a plain edge always goes to the same node; a conditional edge calls a router function that returns the name of the next node at runtime.
The shared state object is the most important concept in the graph model. It is a typed dictionary (a Python TypedDict) that all nodes read from and write to. This is different from the loop model's message history, which is an append-only list of conversational turns. The graph model's state can hold anything: intermediate results, flags, structured data, lists of sub-tasks. Because state is typed and explicit, it is much easier to reason about what has happened and what comes next.
from typing import TypedDict
from langgraph.graph import StateGraph, END
class AgentState(TypedDict):
query: str
tool_results: list[str]
confidence: float
final_answer: str
def retriever_node(state: AgentState) -> AgentState:
# fetch documents, add to state
results = search(state["query"])
return {"tool_results": results}
def router(state: AgentState) -> str:
"""Conditional edge: which node comes next?"""
if state["confidence"] >= 0.8:
return "answer"
return "retriever" # loop back for more evidence
graph = StateGraph(AgentState)
graph.add_node("retriever", retriever_node)
graph.add_node("answer", answer_node)
graph.add_conditional_edges("retriever", router, {"retriever": "retriever", "answer": "answer"})
graph.add_edge("answer", END)Because the entire workflow is an explicit graph, LangGraph can do things the loop model cannot easily do: checkpoint the full state at any node so that a crashed agent can resume from exactly that point, replay from any past state for debugging, and pause execution at a designated node to ask a human for approval before continuing.
Mental model 3: The crew
The crew model, embodied by CrewAI, thinks about agents the way a TV production company thinks about staff: you hire specialists, assign them roles, and give them tasks to execute in sequence. There is no explicit state graph and no hand-rolled while loop. You declare agents (a Researcher, a Writer, a Reviewer), give each one a role, goal, backstory, and tool set, then assemble them into a Crew with an ordered list of Task objects.
In the default sequential process, tasks execute in the order you define them. Each task's output becomes available as context to the next task automatically. In the optional hierarchical process, you designate a manager agent that uses its own LLM to decide which worker agent should handle each task, enabling dynamic delegation. You do not write routing logic; the manager reasons about it.
from crewai import Agent, Task, Crew, Process
researcher = Agent(
role="Senior Research Analyst",
goal="Find accurate, up-to-date information on the given topic",
backstory="You have 10 years of research experience and a nose for primary sources.",
tools=[web_search_tool],
)
writer = Agent(
role="Content Writer",
goal="Turn research into a clear, readable summary",
backstory="You write for a technical audience that values precision over fluff.",
)
research_task = Task(
description="Research the latest developments in {topic}",
expected_output="A bullet-point summary of 5-8 key findings with sources",
agent=researcher,
)
write_task = Task(
description="Write a 300-word article based on the research",
expected_output="A polished article with an intro, body, and conclusion",
agent=writer,
)
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, write_task],
process=Process.sequential,
)
result = crew.kickoff(inputs={"topic": "LLM agent frameworks"})The crew model's strength is how quickly it maps to real-world team structures. If you can describe your workflow as "a researcher gathers facts, a strategist decides what to do with them, a builder executes the plan, and a reviewer checks the output," CrewAI almost writes itself. The crew model's weakness is the same: it assumes your workflow is a team of sequential specialists. When you need fine-grained branching, parallel execution, or crash-safe checkpointing, you end up fighting the abstraction.
Tradeoffs at a glance
Every mental model makes specific tradeoffs between expressiveness, learning curve, debuggability, and speed to first working prototype. The table below captures the key dimensions.
| Dimension | Loop | Graph (LangGraph) | Crew (CrewAI) |
|---|---|---|---|
| Learning curve | Lowest — just a while-loop | Highest — must learn graph concepts, state typing, and edge routing | Low — maps to intuitive team metaphors |
| Expressiveness | High — any logic fits in a loop | Very high — arbitrary branching, parallel nodes, cycles | Medium — sequential or hierarchical team flows |
| State management | Append-only message history | Typed shared dict, full checkpoint support | Task outputs passed forward; no mid-workflow checkpoints |
| Crash recovery | Typically manual | Built-in checkpointing; resume from any node | Restart from the beginning of the crew run |
| Debugging | Print statements on message list | Time-travel debug via saved checkpoints; LangSmith tracing | Inspect task outputs; CrewAI dashboard at enterprise tier |
| Speed to prototype | Fast for simple cases | Slow — graph wiring adds boilerplate | Very fast — 20-line pipelines are common |
| Best for | Single-agent tool loops, chatbots | Complex branching workflows, human-in-the-loop, production systems | Multi-agent team pipelines, content workflows, research tasks |
Mapping real frameworks onto the models
Once you can recognize the three mental models, it becomes easy to categorize any framework you encounter — including ones that were released after this article was written. Most frameworks are not pure instances of one model; they blend elements. The useful question is: what is the dominant abstraction, and what vocabulary does the framework use to express it?
The OpenAI Agents SDK: loop with guardrails
The OpenAI Agents SDK (released March 2025, a production successor to Swarm) is explicitly a loop model. Agents are defined with instructions and tools; the SDK manages the conversation loop, tool execution, and streaming. The one distinctive concept is the handoff — a tool call that transfers control to a different agent, bringing the current conversation state with it. This makes it possible to build multi-agent systems without a graph, by chaining agents through programmatic handoffs. The dominant vocabulary is: Agent, Runner, handoff, guardrail.
AutoGen / AG2: crew with a conversational backbone
AutoGen (and its community continuation AG2, active since Microsoft shifted focus in 2025) blends the crew and loop models. You define ConversableAgent objects with roles, then put them into a group chat or a two-agent conversation. Agents take turns speaking — the "conversation" is the control flow. This makes AutoGen particularly strong for human-in-the-loop scenarios, where a human is literally one of the agents in the chat. The dominant vocabulary is: ConversableAgent, GroupChat, UserProxyAgent, AssistantAgent.
How to read any new framework
When you encounter a framework you have not seen before, ask three questions: (1) What is the core unit? (a node function, a while-loop iteration, a role-based agent) (2) How is state passed between steps? (typed shared dict, append-only message list, task output context) (3) What controls branching? (conditional edges, LLM tool calls, a manager agent's delegation). The answers place the framework in the map above within minutes, before you have written a single line of code.
Going deeper
Once you are comfortable with the three mental models, the next level is understanding where they break down — and what modern frameworks are doing to extend them.
When graphs get unwieldy
LangGraph's graph model earns its complexity for workflows with 8-15+ nodes, conditional branches, and crash-recovery requirements. But many teams find that as a graph grows beyond 20 nodes it becomes difficult to reason about — the visual diagram of the graph itself becomes a source of confusion rather than clarity. At that scale, teams often break the single graph into a hierarchy of subgraphs (LangGraph supports this natively with the add_node + compiled-graph approach), where each subgraph is itself a well-understood loop or crew. The mental models are not mutually exclusive; sophisticated production systems often combine all three.
Stateful loops vs. stateless graphs
A common confusion is that stateful means graph model. In fact, loops can be stateful too — the message history IS the state. The difference is the structure of the state: the loop model's state is a linear sequence (conversation history), while the graph model's state is a structured, typed dict that can hold arbitrary named fields. For simple workflows, linear state is all you need. When you find yourself adding multiple ad-hoc fields to your message history — stuffing structured data into assistant messages, parsing JSON out of tool outputs — that is the signal that you have outgrown the loop model's state structure and the graph model's typed state dict would clean things up.
The emerging fourth model: choreography
A fourth mental model is gaining traction in 2025-2026: choreography, where agents communicate via a shared event bus rather than being centrally orchestrated. No single node, loop, or manager controls the flow; each agent subscribes to event types, reacts independently, and publishes new events that other agents pick up. Frameworks like Dapr Agents and event-driven patterns on top of LangGraph implement this model. It scales better to very large multi-agent systems where centralized orchestration becomes a bottleneck, but debugging is significantly harder because causality is implicit in the event stream rather than explicit in graph edges or task lists.
How mental models affect prompt engineering
The mental model you choose influences how you write prompts. In the loop model, one agent prompt does everything — it must handle tool selection, reasoning, and answer generation in a single system prompt that grows longer as the task grows. In the graph model, each node has a focused, small system prompt because the node only does one thing; routing logic lives in code, not in the LLM. In the crew model, each agent's system prompt is its role description — detailed backstory and goal statements that prime the LLM to behave like a specialist. The same content work — research, synthesis, and writing — looks completely different in each model's prompt architecture, which is why switching mental models often means rewriting prompts from scratch.
FAQ
What mental model does LangGraph use?
LangGraph uses the graph model: your agent workflow is a directed graph where nodes are Python functions that read and write a shared typed state object, and edges (including conditional edges) define what runs next. This gives you explicit control over branching, parallel execution, and crash-safe checkpointing, at the cost of more upfront boilerplate than simpler frameworks.
What is the ReAct loop and which frameworks use it?
ReAct (Reasoning + Acting) is the canonical loop mental model for agents: the LLM alternates between reasoning about what to do, calling a tool (acting), observing the result, and repeating until it can answer. Almost every agent framework implements this loop internally. Frameworks like the OpenAI Agents SDK and bare API-based approaches expose this loop directly; LangGraph and CrewAI use it inside nodes and tasks but add additional structure on top.
Is CrewAI better than LangGraph?
Neither is universally better — they use different mental models suited to different problems. CrewAI's crew model is faster to prototype role-based multi-agent pipelines, with a learning curve close to zero. LangGraph's graph model gives you precise control over branching, checkpointing, and complex state, but requires more setup. If your workflow looks like a team of specialists passing work down a chain, start with CrewAI. If it has many conditional branches or needs crash-recovery, use LangGraph.
Can I mix mental models in the same agent system?
Yes, and production systems often do. A common pattern is to use a LangGraph graph as the outer orchestrator (for its state management and checkpointing), with individual nodes that internally run a CrewAI crew or a bare agent loop. LangGraph explicitly supports subgraphs, making it possible to encapsulate a full crew or loop as a single node in a larger graph.
Which mental model should beginners start with?
Start with the loop model — it is the closest to how LLMs actually work, has no framework-specific vocabulary to learn, and makes bugs obvious. Build a working agent with a raw API call and a while-loop before reaching for any framework. Once the loop becomes unwieldy (too many branches, state growing complex), you will know exactly which framework feature you need and why.
Why does the choice of mental model affect debugging?
Because each model has a different causal structure, bugs manifest differently. In the loop model, you trace through a growing message list to see what the LLM was told and what it responded. In the graph model, you inspect which node produced which state transition — LangGraph's checkpoint system lets you replay from any past state. In the crew model, you check each task's expected and actual output to find where context was dropped or contaminated.