Weave

Trace, evaluate, and debug LLM and agent apps with a Python decorator

github.com/wandb/weave★ 1.1k weave-docs.wandb.ai

Overview

Weave is a Python toolkit from Weights & Biases for building generative AI applications. You add a decorator to the functions you want to track, and Weave logs their inputs, outputs, and the full call tree so you can see what your app actually did at each step.

It is meant for developers working with language models, whether you call hosted APIs like OpenAI, Anthropic, or Google AI Studio, or run open-source models from Hugging Face. As an LLM observability and tracing tool, it gives you a record of every traced call and a place to compare evaluation runs as you move from experiments to production.

Beyond tracing, Weave helps you build repeatable, apples-to-apples evaluations for your use case and keep all the data generated across the LLM workflow organized in one place.

What it does

Trace any function by decorating it with @weave.op, building a tree of inputs and outputs
Works with calls to OpenAI, Anthropic, Google AI Studio, Hugging Face, and other models
Log and debug language model inputs, outputs, and traces during development
Build rigorous, apples-to-apples evaluations for your language model use cases
Organize information across the whole LLM workflow, from experimentation to production
Free tier available with a Weights & Biases account

Getting started

You need Python 3.10 or higher and a Weights & Biases account (free tier available). Then install Weave, initialize a project, and decorate the functions you want to trace.

Install Weave

Install the package from PyPI.

bashbash

pip install weave

Import and initialize

Initialize Weave with a project name. This connects your traces to your Weights & Biases project.

pythonpython

import weave
weave.init("my-project-name")

Trace your functions

Decorate any function you want to track with @weave.op to capture its inputs and outputs.

pythonpython

@weave.op
def my_function():
    # Your tracked code!
    pass

Trace a call tree

Decorating several functions produces a trace tree showing how inputs and outputs flow between them.

pythonpython

import weave
weave.init("weave-example")

@weave.op
def sum_nine(value_one: int):
    return value_one + 9

@weave.op
def multiply_two(value_two: int):
    return value_two * 2

@weave.op
def main():
    output = sum_nine(3)
    final_output = multiply_two(output)
    return final_output

main()

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Debug a generative AI app by inspecting the recorded inputs, outputs, and traces of each model call
Compare model or prompt changes with repeatable, apples-to-apples evaluations
Track an agent's multi-step function calls as a single trace tree
Keep experiments, evaluations, and production data for an LLM project organized in one place

How Weave compares

Weave alongside other open-source observability & llmops tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Langfuse	★ 29.9k	A self-hostable platform for tracing LLM and agent calls, managing prompts, and running evaluations to debug and improve AI applications.
Opik	★ 20k	An open-source platform from Comet for tracing, evaluating, and monitoring LLM applications, RAG systems, and agent workflows with dashboards and LLM-as-judge metrics.
TensorZero	★ 11.7k	An open-source LLMOps platform that puts a single gateway in front of every major LLM provider and adds observability, evaluation, optimization, and A/B testing.
Evidently	★ 7.6k	A monitoring and evaluation framework for ML and LLM systems that tracks output quality, drift, and test results over time with reports and dashboards.
OpenLLMetry	★ 7.2k	An OpenTelemetry-based SDK that auto-instruments LLM providers, vector databases, and frameworks so traces flow into any existing observability backend.
Helicone	★ 5.9k	A proxy-based observability platform that logs, monitors, and evaluates LLM API calls by routing requests through its endpoint with one line of code.
AgentOps	★ 5.7k	An SDK for monitoring AI agents that tracks LLM cost, session replays, and performance across frameworks like CrewAI, LangChain, and the OpenAI Agents SDK.
Weave	★ 1.1k	Trace, evaluate, and debug LLM and agent apps with a Python decorator

// Overview

// What it does

// Getting started

Install Weave

Import and initialize

Trace your functions

Trace a call tree

// When to use it

// How Weave compares

Overview

What it does

Getting started

When to use it

How Weave compares