Opik

Open-source tracing, evaluation, and monitoring for LLM and agent apps

github.com/comet-ml/opik★ 19.7k comet.com/site/products/opik

Overview

Opik is an open-source platform from Comet for building, testing, and monitoring applications built on large language models. It records the calls your app makes to an LLM, stores them as traces, and lets you inspect what went in and what came back, from early prototyping through production.

It is aimed at developers and teams working on RAG chatbots, code assistants, and agent systems who need to see why an app behaves the way it does. Alongside tracing, it offers evaluation tools, including LLM-as-a-judge metrics for things like hallucination detection, moderation, and RAG answer relevance.

As an LLM observability tool, Opik pairs a self-hostable server (run locally with Docker Compose) with client SDKs and many framework integrations, so you can collect traces, score them, and watch feedback scores, trace counts, and token usage over time in a dashboard.

What it does

Tracing of LLM calls, conversations, and agent activity, viewable in a UI
LLM-as-a-judge metrics for hallucination detection, moderation, and RAG assessment (answer relevance, context precision)
Datasets and experiments to automate evaluation, plus a PyTest integration for CI/CD
Production monitoring dashboards with online evaluation rules; built to handle high trace volumes
Many third-party framework integrations, including Google ADK, Autogen, and Flowise AI
Self-hostable server via Docker Compose, plus the Opik Agent Optimizer and Guardrails

Getting started

Install the Python SDK, optionally run the server locally, then wrap your LLM function with a decorator to start logging traces.

Install the SDK

Install the Opik client with pip (or uv).

bashbash

pip install opik

Run the server locally (optional)

Clone the repo and start the self-hosted server with the install script; the UI runs at http://localhost:5173.

bashbash

git clone https://github.com/comet-ml/opik.git
cd opik
./opik.sh

Configure

Point the SDK at your local server (or Comet cloud).

bashbash

opik configure

Trace a function

Add the @opik.track decorator to log calls as traces.

pythonpython

import opik

opik.configure(use_local=True)

@opik.track
def my_llm_function(user_question: str) -> str:
    # Your LLM code here
    return "Hello"

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Debugging a RAG chatbot by inspecting each LLM call and its retrieved context in traces
Scoring outputs for hallucinations or moderation issues using LLM-as-a-judge metrics
Adding evaluation checks to a CI/CD pipeline with the PyTest integration
Monitoring trace counts, token usage, and feedback scores for an agent app in production

How Opik compares

Opik alongside other open-source observability & llmops tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Langfuse	★ 29.4k	A self-hostable platform for tracing LLM and agent calls, managing prompts, and running evaluations to debug and improve AI applications.
Opik	★ 19.7k	Open-source tracing, evaluation, and monitoring for LLM and agent apps
TensorZero	★ 11.7k	An open-source LLMOps platform that puts a single gateway in front of every major LLM provider and adds observability, evaluation, optimization, and A/B testing.
Evidently	★ 7.6k	A monitoring and evaluation framework for ML and LLM systems that tracks output quality, drift, and test results over time with reports and dashboards.
OpenLLMetry	★ 7.2k	An OpenTelemetry-based SDK that auto-instruments LLM providers, vector databases, and frameworks so traces flow into any existing observability backend.
Helicone	★ 5.8k	A proxy-based observability platform that logs, monitors, and evaluates LLM API calls by routing requests through its endpoint with one line of code.
AgentOps	★ 5.6k	An SDK for monitoring AI agents that tracks LLM cost, session replays, and performance across frameworks like CrewAI, LangChain, and the OpenAI Agents SDK.
Pydantic Logfire	★ 4.3k	An observability platform from the Pydantic team that records LLM calls, agent runs, and tool invocations with tokens, cost, and latency attached.

// Overview

// What it does

// Getting started

Install the SDK

Run the server locally (optional)

Configure

Trace a function

// When to use it

// How Opik compares

Overview

What it does

Getting started

When to use it

How Opik compares