AI/TLDR

Weave

Trace, evaluate, and debug LLM and agent apps with a Python decorator

Overview

Weave is a Python toolkit from Weights & Biases for building generative AI applications. You add a decorator to the functions you want to track, and Weave logs their inputs, outputs, and the full call tree so you can see what your app actually did at each step.

It is meant for developers working with language models, whether you call hosted APIs like OpenAI, Anthropic, or Google AI Studio, or run open-source models from Hugging Face. As an LLM observability and tracing tool, it gives you a record of every traced call and a place to compare evaluation runs as you move from experiments to production.

Beyond tracing, Weave helps you build repeatable, apples-to-apples evaluations for your use case and keep all the data generated across the LLM workflow organized in one place.

What it does

  • Trace any function by decorating it with @weave.op, building a tree of inputs and outputs
  • Works with calls to OpenAI, Anthropic, Google AI Studio, Hugging Face, and other models
  • Log and debug language model inputs, outputs, and traces during development
  • Build rigorous, apples-to-apples evaluations for your language model use cases
  • Organize information across the whole LLM workflow, from experimentation to production
  • Free tier available with a Weights & Biases account

Getting started

You need Python 3.10 or higher and a Weights & Biases account (free tier available). Then install Weave, initialize a project, and decorate the functions you want to trace.

Install Weave

Install the package from PyPI.

bashbash
pip install weave

Import and initialize

Initialize Weave with a project name. This connects your traces to your Weights & Biases project.

pythonpython
import weave
weave.init("my-project-name")

Trace your functions

Decorate any function you want to track with @weave.op to capture its inputs and outputs.

pythonpython
@weave.op
def my_function():
    # Your tracked code!
    pass

Trace a call tree

Decorating several functions produces a trace tree showing how inputs and outputs flow between them.

pythonpython
import weave
weave.init("weave-example")

@weave.op
def sum_nine(value_one: int):
    return value_one + 9

@weave.op
def multiply_two(value_two: int):
    return value_two * 2

@weave.op
def main():
    output = sum_nine(3)
    final_output = multiply_two(output)
    return final_output

main()

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Debug a generative AI app by inspecting the recorded inputs, outputs, and traces of each model call
  • Compare model or prompt changes with repeatable, apples-to-apples evaluations
  • Track an agent's multi-step function calls as a single trace tree
  • Keep experiments, evaluations, and production data for an LLM project organized in one place

How Weave compares

Weave alongside other open-source observability & llmops tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Langfuse★ 29.9kA self-hostable platform for tracing LLM and agent calls, managing prompts, and running evaluations to debug and improve AI applications.
Opik★ 20kAn open-source platform from Comet for tracing, evaluating, and monitoring LLM applications, RAG systems, and agent workflows with dashboards and LLM-as-judge metrics.
TensorZero★ 11.7kAn open-source LLMOps platform that puts a single gateway in front of every major LLM provider and adds observability, evaluation, optimization, and A/B testing.
Evidently★ 7.6kA monitoring and evaluation framework for ML and LLM systems that tracks output quality, drift, and test results over time with reports and dashboards.
OpenLLMetry★ 7.2kAn OpenTelemetry-based SDK that auto-instruments LLM providers, vector databases, and frameworks so traces flow into any existing observability backend.
Helicone★ 5.9kA proxy-based observability platform that logs, monitors, and evaluates LLM API calls by routing requests through its endpoint with one line of code.
AgentOps★ 5.7kAn SDK for monitoring AI agents that tracks LLM cost, session replays, and performance across frameworks like CrewAI, LangChain, and the OpenAI Agents SDK.
Weave★ 1.1kTrace, evaluate, and debug LLM and agent apps with a Python decorator