Agent Lightning

Train almost any AI agent with reinforcement learning and prompt optimization, with little to no code change

github.com/microsoft/agent-lightning★ 17.3k microsoft.github.io/agent-lightning

Overview

Agent Lightning is an open-source training framework from Microsoft Research that helps you improve AI agents after you have already built them. Instead of rewriting your agent, you keep it running as usual and let the framework collect what it does: every prompt, tool call, and reward. Those events become structured traces that a learning algorithm can use to make the agent better.

It works with almost any agent framework, including LangChain, the OpenAI Agents SDK, AutoGen, CrewAI, and Microsoft Agent Framework, or with no framework at all if you just call an LLM directly from Python. The project says you can turn an agent into something trainable with zero code change in most cases, and you can choose to optimize one agent or several inside a larger multi-agent system.

Agent Lightning supports several improvement methods, including reinforcement learning, automatic prompt optimization, and supervised fine-tuning. A central store keeps tasks, resources, and traces in sync, and a trainer ties everything together so improvements such as refined prompts or new model weights flow back into the running agent.

What it does

Works with almost any agent stack: LangChain, OpenAI Agents SDK, AutoGen, CrewAI, Microsoft Agent Framework, or plain Python OpenAI calls
Adds training with little to no code change to your existing agent
Supports several methods: reinforcement learning, automatic prompt optimization, and supervised fine-tuning
Lets you selectively optimize one or more agents inside a multi-agent system
Captures prompts, tool calls, and rewards as structured traces through a central store (LightningStore)
Released by Microsoft under the MIT license, with documentation and runnable examples

Getting started

Agent Lightning is a Python package published on PyPI. Install it, then follow the documentation and bundled examples to wire your existing agent into the trainer.

Install from PyPI

Install the stable release with pip.

bashbash

pip install agentlightning

Or install the nightly build

If you want the latest cutting-edge features, install the pre-release build from Test PyPI.

bashbash

pip install --upgrade --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ --pre agentlightning

Connect your agent and train

Keep your agent running as it is. Drop in the lightweight agl.emit_xxx() helpers, or let the tracer collect each prompt, tool call, and reward automatically. Those events flow into the store, your chosen algorithm learns from them, and the trainer feeds improvements back to the agent. See the official documentation and the examples folder in the repository for complete, runnable setups.

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Improve an existing LangChain, CrewAI, or AutoGen agent with reinforcement learning without rewriting it
Train an agent to write and self-correct SQL queries using reward signals
Tune just one agent inside a larger multi-agent system while leaving the others unchanged
Apply automatic prompt optimization or supervised fine-tuning to a plain Python OpenAI agent

How Agent Lightning compares

Agent Lightning alongside other open-source rlhf & alignment tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Open-R1	★ 26.3k	An open reproduction of the DeepSeek-R1 reasoning pipeline, with scripts for GRPO training and reasoning-data generation.
verl	★ 22.1k	Volcano Engine's RL post-training framework (HybridFlow) for building GRPO, PPO, and other RL pipelines on top of FSDP, Megatron, and vLLM.
TRL	★ 18.7k	Hugging Face's post-training library with trainers for SFT, reward modeling, DPO, PPO, and GRPO to align language models with preferences.
Agent Lightning	★ 17.3k	Train almost any AI agent with reinforcement learning and prompt optimization, with little to no code change
ART	★ 10.1k	OpenPipe's Agent Reinforcement Trainer for post-training LLM agents on multi-step tasks using GRPO and rule- or judge-based rewards.
OpenRLHF	★ 9.7k	A Ray- and vLLM-based RLHF framework that scales PPO, GRPO, and REINFORCE++ training to models with 70B+ parameters.
Alignment Handbook	★ 5.6k	A set of recipes and scripts from Hugging Face showing how to run the full SFT-then-preference-alignment pipeline used to build aligned chat models.
Verifiers	★ 4.2k	A library for defining verifiable-reward environments and running reinforcement-learning fine-tuning of LLMs against those rewards.

// Overview

// What it does

// Getting started