AI/TLDR

What Is LangGraph?

Understand how LangGraph models an agent as a graph of nodes and shared state — and why that explicitness matters for loops, branching, checkpointing, and human oversight.

INTERMEDIATE11 MIN READUPDATED 2026-06-12

In plain English

LangGraph is a low-level orchestration library from the LangChain team that lets you build agents as explicit graphs. Instead of hiding the agent loop inside a black box, you draw out every step as a node (a Python or TypeScript function), every possible transition as an edge, and every piece of information the agent carries as a typed state object. The framework runs that graph for you — routing, looping, pausing for human input, and saving progress to disk along the way.

Think of it like a flowchart that actually executes. Each box in the flowchart is a node; each arrow is an edge. When you need to loop (call a tool, read the result, decide whether to call another tool or finish), you simply draw an edge that points back to an earlier node. When you need to branch ("if the tool raised an error, go to the retry node; otherwise go to the answer node"), you write a conditional edge — a function that returns the name of the next node to run.

LangGraph sits inside the LangChain ecosystem but is fully standalone: you can use it without any other LangChain package, and you can drop any LLM, tool, or retriever into a node — not just LangChain components. It's MIT-licensed and free to use.

Why it matters

Early LLM agent frameworks (including LangChain's own AgentExecutor) wrapped the reason-act loop inside a single method call. That worked for demos, but production agents hit real walls fast: the loop was a black box you couldn't easily inspect, you couldn't pause it mid-run to get a human's approval, you couldn't resume it after a crash, and branching logic required increasingly tortured workarounds.

LangGraph was built specifically to fix those walls. Its graph model turns every one of those pain points into a first-class feature:

  • Visibility: every node, its input state, and its output state are individually observable. LangSmith can trace them one by one.
  • Loops and cycles: unlike a chain (which is a directed acyclic graph — no loops allowed), LangGraph's state machine can revisit earlier nodes as many times as the logic requires.
  • Persistence and fault tolerance: LangGraph's built-in checkpointer saves the state after every node. If a run crashes, you resume from the last checkpoint rather than starting over.
  • Human-in-the-loop: the graph can pause at a designated node, surface a question or approval request to a human, and then continue once a response arrives — even hours later.
  • Multi-agent composition: entire subgraphs can be nested as a single node in a parent graph, so a team of specialized agents becomes one composable unit.

These aren't niche requirements. Any agent handling real tasks — filling out a form, writing and executing code, researching a topic across many tool calls — will encounter loops, failures, and moments where a human should review before the agent continues. LangGraph gives you the infrastructure to handle those situations without hacking around the framework.

How it works: nodes, edges, and state

Every LangGraph program is a StateGraph. You define it by answering three questions: what shared state flows through the graph, what nodes process that state, and what edges connect them. Once built, you compile the graph and call .invoke() (or .stream()) to run it.

State

State is a typed dictionary (or Pydantic model) that every node can read from and write to. A simple chatbot agent might carry just messages: list. A research agent might carry query, search_results, drafted_answer, and revision_count. Each node receives the full current state and returns a partial update — only the keys it changed.

Nodes

A node is any Python (or TypeScript) function that takes the current state and returns a dictionary of state updates. That's it. The function can call an LLM, run a tool, query a database, or do pure logic. Nodes are completely ordinary code — no magic base classes required.

Edges

Edges tell the runtime which node to run next. There are two kinds: fixed edges (graph.add_edge("a", "b") — always go from A to B) and conditional edges (graph.add_conditional_edges("a", routing_fn) — call a function on the current state and follow the returned node name). The special node name END terminates the run.

The loop above is the canonical ReAct (Reason + Act) pattern made explicit. The agent node calls the LLM; the LLM emits either a tool call or a final answer. The conditional edge routes to the tools node for tool calls (which updates the messages list with the tool result and loops back) or to END for a final answer. You can read every line of this logic — nothing is hidden.

A minimal example

simple_agent.pypython
# pip install langgraph langchain-anthropic
from typing import Annotated
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langchain_anthropic import ChatAnthropic
from langchain_core.tools import tool
from langchain_core.messages import ToolMessage
from typing_extensions import TypedDict

# 1. Define the shared state.
class State(TypedDict):
    messages: Annotated[list, add_messages]  # add_messages appends, not overwrites

# 2. Bind a tool to the model.
@tool
def get_word_count(text: str) -> int:
    """Count the words in a text string."""
    return len(text.split())

llm = ChatAnthropic(model="claude-sonnet-4-5").bind_tools([get_word_count])

# 3. Define nodes.
def agent(state: State) -> dict:
    response = llm.invoke(state["messages"])
    return {"messages": [response]}

def run_tools(state: State) -> dict:
    last = state["messages"][-1]
    results = []
    for call in last.tool_calls:
        fn = {"get_word_count": get_word_count}[call["name"]]
        results.append(ToolMessage(content=str(fn.invoke(call["args"])), tool_call_id=call["id"]))
    return {"messages": results}

def should_continue(state: State) -> str:
    last = state["messages"][-1]
    return "tools" if last.tool_calls else END

# 4. Build and compile the graph.
graph = StateGraph(State)
graph.add_node("agent", agent)
graph.add_node("tools", run_tools)
graph.set_entry_point("agent")
graph.add_conditional_edges("agent", should_continue)
graph.add_edge("tools", "agent")   # loop back after every tool run
app = graph.compile()

# 5. Run it.
for chunk in app.stream({"messages": [{"role": "user", "content": "How many words are in 'the quick brown fox'?"}]}):
    print(chunk)

Notice line 4: graph.add_edge("tools", "agent") is the loop. After each tool execution the graph returns to the agent node to let the LLM reason about the result. The loop can run as many times as the LLM needs before routing to END. This is impossible with a plain linear chain.

Persistence and human-in-the-loop

Two of LangGraph's most powerful features go hand-in-hand: checkpointing saves state at each step, and human-in-the-loop (HITL) exploits that saved state to pause for input.

Checkpointing

Pass a checkpointer (LangGraph ships with MemorySaver for in-process storage, and a SqliteSaver / Postgres-backed saver for production) to graph.compile(). From that point on, after every node the runtime serializes the full state to the store. If the run fails, calling .invoke() again with the same thread_id resumes from where it left off. For long-running workflows that span minutes or hours, this is non-negotiable.

persistence.pypython
from langgraph.checkpoint.memory import MemorySaver

# Compile with checkpointing enabled.
app = graph.compile(checkpointer=MemorySaver())

# Each call with the same thread_id continues the same conversation.
config = {"configurable": {"thread_id": "user-42"}}
app.invoke({"messages": [{"role": "user", "content": "Hi"}]}, config=config)
app.invoke({"messages": [{"role": "user", "content": "What did I just say?"}]}, config=config)

Human-in-the-loop

Sometimes you need a human to review or approve before the agent continues — for example, before sending an email, executing a database write, or taking an action above a cost threshold. LangGraph handles this with interrupt: you compile the graph with interrupt_before=["node_name"]. When the runner reaches that node, it pauses and surfaces the current state. A human can inspect it, optionally edit the state, then call .invoke() again to resume — picking up exactly where it stopped.

This pattern is what separates a toy agent from a trustworthy one. For anything that touches money, external systems, or sensitive data, requiring human sign-off before irreversible actions is a best practice the LangGraph model makes genuinely easy to implement.

When to use LangGraph (vs. plain LangChain)

LangGraph is not always the right tool. The explicit graph model adds real cognitive overhead — you're writing boilerplate (state class, individual nodes, edges) that a higher-level helper would hide. That overhead pays off quickly when your agent needs any of the following, but it's overkill for simple pipelines.

A practical rule of thumb: start with create_agent from plain LangChain, which gives you the tool-use loop with one line of code. Graduate to LangGraph the moment you hit something create_agent can't express — usually the first time you need checkpointing, a human pause, or a branch that depends on state.

Multi-agent systems

LangGraph is the recommended foundation for multi-agent systems in the LangChain ecosystem. The pattern is simple: compile a subgraph for each specialist agent (researcher, coder, editor), then add each subgraph as a single node in a supervisor graph. The supervisor node uses a conditional edge to decide which specialist to call next, and each specialist returns its result to the supervisor's state. The entire team is still one graph with one shared state — checkpointing, HITL, and time travel all work as normal.

Going deeper

Once the core graph model clicks, a few more LangGraph concepts separate basic usage from production-grade agents.

State reducers

By default, a node's returned dict overwrites the matching keys in state. For lists (like messages), you almost always want to append instead. LangGraph handles this with reducers: annotate a field with Annotated[list, add_messages] (or any custom merge function) and the runtime will call the reducer when it applies an update. This is how multi-turn memory works without any manual bookkeeping.

Streaming

Call .stream() instead of .invoke() to receive updates as they happen. The stream_mode parameter controls granularity: "values" emits the full state after each node; "updates" emits just the dict the node returned; "messages" emits individual LLM token chunks for live streaming to a UI. This is how you build a chatbot that shows the model "thinking" in real time, like Claude's extended thinking display.

LangGraph Platform

The open-source library handles the runtime logic, but deploying a persistent, multi-tenant agent to production still requires infrastructure: a database for checkpoints, a task queue for async runs, a way to expose the graph over HTTP, and monitoring. LangGraph Platform (a managed service from LangChain) packages all of that with a drag-and-drop Studio UI for visualizing and debugging graphs live. It's optional — you can deploy the library on your own infra — but it removes significant ops work.

Honest limitations

LangGraph does not make your agent smart — that's still the LLM's job. It also doesn't prevent the agent from getting stuck in infinite loops (you're responsible for loop termination conditions, or use a recursion_limit). The graph model adds boilerplate that smaller or simpler workflows don't need. And like all deep orchestration frameworks, a wrong state shape can produce cryptic errors far from the source. Use LLM observability tooling — LangSmith or equivalent — from day one, not as an afterthought.

FAQ

What is LangGraph and how is it different from LangChain?

LangGraph is LangChain's low-level orchestration library for building agents as explicit state machines — graphs of nodes, edges, and shared state. Plain LangChain provides high-level helpers (chains, create_agent) that hide the loop. LangGraph exposes it: you define every step and every routing decision yourself, gaining visibility, checkpointing, and human-in-the-loop support that the higher-level helpers can't easily provide.

Do I need LangChain to use LangGraph?

No. LangGraph is a standalone package (pip install langgraph). You can use any LLM provider — Anthropic, OpenAI, a local model — directly inside nodes without importing anything from langchain or langchain-core. The two libraries integrate well but neither requires the other.

How does human-in-the-loop work in LangGraph?

Compile the graph with interrupt_before=["node_name"]. When the runner reaches that node, it saves the current state to the checkpointer and pauses. A human can inspect the state (and optionally edit it), then call .invoke() again with the same thread_id to resume. The agent continues from exactly where it stopped, with any human edits already applied to the state.

What is a LangGraph checkpointer?

A checkpointer is a storage backend that LangGraph uses to save the full graph state after every node. Built-in options include MemorySaver (in-process, useful for testing), SqliteSaver, and a Postgres-backed saver. Checkpointing enables fault-tolerant long-running agents (resume after crashes), multi-turn memory (same thread_id across calls), and human-in-the-loop pauses.

When should I use LangGraph instead of a simple agent loop?

Use LangGraph when your agent needs to loop over tools, survive crashes and resume, pause for human approval, branch on complex conditions, or compose multiple specialist agents. For a single prompt-and-response or a simple one-pass pipeline, the overhead isn't worth it — use create_agent or call the LLM directly.

Is LangGraph free and open source?

Yes, the core LangGraph library is MIT-licensed and free to use. LangGraph Platform (the managed deployment service with Studio UI, async task queue, and hosted checkpointing) is a separate commercial product. You can deploy the open-source library on your own infrastructure without any paid tier.

Further reading