LiveKit Agents

Build realtime voice AI agents that can see, hear, and speak

github.com/livekit/agents★ 11.1k docs.livekit.io/agents

Overview

LiveKit Agents is an open-source framework for building realtime, programmable participants that run on your servers. You use it to create conversational, multimodal voice agents that can see, hear, and understand a user, then respond in a natural back-and-forth conversation.

An agent is wired together from a speech-to-text model, an LLM, and a text-to-speech model, and you are free to mix and match providers to fit your use case. The framework handles the realtime media plumbing through LiveKit's WebRTC stack, so an agent can talk to web and mobile clients or even take phone calls through the telephony integration.

There is also a matching JavaScript and TypeScript library, AgentsJS, for teams that prefer to build in that ecosystem.

What it does

Flexible integrations: a broad ecosystem of plug-ins lets you mix and match STT, LLM, TTS, and realtime API providers to suit your use case.
Built-in job scheduling and dispatch APIs that connect end users to the right agent process.
Telephony integration so your agent can place and receive phone calls through LiveKit's SIP stack.
Semantic turn detection using a transformer model that knows when a user has finished speaking, reducing interruptions.
Native MCP support so you can add tools from MCP servers with a single line of code.
A built-in test framework with judges so you can check that an agent behaves as expected despite non-deterministic LLM output.

Getting started

LiveKit Agents is a Python library installed from PyPI. Install the core library together with the provider plug-ins you want, then write an entrypoint that starts an AgentSession. You will need LIVEKIT_URL, LIVEKIT_API_KEY, and LIVEKIT_API_SECRET set in your environment.

Install the library with plug-ins

Install the core Agents library along with plug-ins for popular model providers using pip.

bashbash

pip install "livekit-agents[openai,silero,deepgram,cartesia,turn-detector]"

Define a simple voice agent

Create an AgentServer, register an rtc_session entrypoint, and start an AgentSession that combines STT, an LLM, and TTS. The agent below greets the user and can look up the weather through a function tool.

pythonpython

from livekit.agents import (
    Agent, AgentServer, AgentSession, JobContext,
    RunContext, cli, function_tool, inference,
)
from livekit.plugins import silero


@function_tool
async def lookup_weather(context: RunContext, location: str):
    """Used to look up weather information."""
    return {"weather": "sunny", "temperature": 70}


server = AgentServer()


@server.rtc_session()
async def entrypoint(ctx: JobContext):
    session = AgentSession(
        vad=silero.VAD.load(),
        stt=inference.STT("deepgram/nova-3", language="multi"),
        llm=inference.LLM("openai/gpt-4.1-mini"),
        tts=inference.TTS("cartesia/sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"),
    )
    agent = Agent(
        instructions="You are a friendly voice assistant built by LiveKit.",
        tools=[lookup_weather],
    )
    await session.start(agent=agent, room=ctx.room)
    await session.generate_reply(instructions="greet the user and ask about their day")


if __name__ == "__main__":
    cli.run_app(server)

Set your environment variables

Provide the LiveKit connection details before running the example. These point the agent at your LiveKit server and authenticate it.

bashbash

export LIVEKIT_URL=...
export LIVEKIT_API_KEY=...
export LIVEKIT_API_SECRET=...

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Build conversational voice assistants for web and mobile apps that respond in realtime over WebRTC.
Create phone-based agents that make or take calls through LiveKit's telephony stack, for support lines or outbound calling.
Design multi-agent experiences where one agent hands off the conversation to another specialized agent mid-session.
Write automated tests with built-in judges to keep agent behavior reliable despite non-deterministic LLM responses.

How LiveKit Agents compares

LiveKit Agents alongside other open-source agent frameworks & builders tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
AutoGPT	★ 185k	One of the earliest autonomous agent projects, now a platform for building and running agents from reusable blocks and workflows.
Agno	★ 40.8k	A fast Python framework (formerly Phidata) for building agents with memory, tools, and multimodal inputs, plus a runtime for deploying them in production.
AgentGPT	★ 36.2k	AgentGPT lets you name a custom AI, give it a goal, and watch it plan tasks, run them, and learn from the results, all from a web browser.
LangGraph	★ 35.3k	A library from the LangChain team for building stateful, graph-based agent workflows with explicit control over steps, memory, and human-in-the-loop checkpoints.
Composio	★ 28.9k	Composio is an open-source SDK for Python and TypeScript that gives AI agents ready-made tools to act on real apps and APIs across many agent frameworks.
Semantic Kernel	★ 28.2k	Microsoft's SDK for adding agents, plugins, and planning to apps across .NET, Python, and Java.
smolagents	★ 27.9k	A minimal agent library from Hugging Face where the model writes and runs Python code to call tools and complete tasks.
LiveKit Agents	★ 11.1k	Build realtime voice AI agents that can see, hear, and speak

// Overview

// What it does

// Getting started