Overview
Agent Lightning is an open-source training framework from Microsoft Research that helps you improve AI agents after you have already built them. Instead of rewriting your agent, you keep it running as usual and let the framework collect what it does: every prompt, tool call, and reward. Those events become structured traces that a learning algorithm can use to make the agent better.
It works with almost any agent framework, including LangChain, the OpenAI Agents SDK, AutoGen, CrewAI, and Microsoft Agent Framework, or with no framework at all if you just call an LLM directly from Python. The project says you can turn an agent into something trainable with zero code change in most cases, and you can choose to optimize one agent or several inside a larger multi-agent system.
Agent Lightning supports several improvement methods, including reinforcement learning, automatic prompt optimization, and supervised fine-tuning. A central store keeps tasks, resources, and traces in sync, and a trainer ties everything together so improvements such as refined prompts or new model weights flow back into the running agent.
What it does
- Works with almost any agent stack: LangChain, OpenAI Agents SDK, AutoGen, CrewAI, Microsoft Agent Framework, or plain Python OpenAI calls
- Adds training with little to no code change to your existing agent
- Supports several methods: reinforcement learning, automatic prompt optimization, and supervised fine-tuning
- Lets you selectively optimize one or more agents inside a multi-agent system
- Captures prompts, tool calls, and rewards as structured traces through a central store (LightningStore)
- Released by Microsoft under the MIT license, with documentation and runnable examples
Getting started
Agent Lightning is a Python package published on PyPI. Install it, then follow the documentation and bundled examples to wire your existing agent into the trainer.
Install from PyPI
Install the stable release with pip.
pip install agentlightningOr install the nightly build
If you want the latest cutting-edge features, install the pre-release build from Test PyPI.
pip install --upgrade --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ --pre agentlightningConnect your agent and train
Keep your agent running as it is. Drop in the lightweight agl.emit_xxx() helpers, or let the tracer collect each prompt, tool call, and reward automatically. Those events flow into the store, your chosen algorithm learns from them, and the trainer feeds improvements back to the agent. See the official documentation and the examples folder in the repository for complete, runnable setups.
Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Improve an existing LangChain, CrewAI, or AutoGen agent with reinforcement learning without rewriting it
- Train an agent to write and self-correct SQL queries using reward signals
- Tune just one agent inside a larger multi-agent system while leaving the others unchanged
- Apply automatic prompt optimization or supervised fine-tuning to a plain Python OpenAI agent
How Agent Lightning compares
Agent Lightning alongside other open-source rlhf & alignment tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| Open-R1 | ★ 26.3k | An open reproduction of the DeepSeek-R1 reasoning pipeline, with scripts for GRPO training and reasoning-data generation. |
| verl | ★ 22.1k | Volcano Engine's RL post-training framework (HybridFlow) for building GRPO, PPO, and other RL pipelines on top of FSDP, Megatron, and vLLM. |
| TRL | ★ 18.7k | Hugging Face's post-training library with trainers for SFT, reward modeling, DPO, PPO, and GRPO to align language models with preferences. |
| Agent Lightning | ★ 17.3k | Train almost any AI agent with reinforcement learning and prompt optimization, with little to no code change |
| ART | ★ 10.1k | OpenPipe's Agent Reinforcement Trainer for post-training LLM agents on multi-step tasks using GRPO and rule- or judge-based rewards. |
| OpenRLHF | ★ 9.7k | A Ray- and vLLM-based RLHF framework that scales PPO, GRPO, and REINFORCE++ training to models with 70B+ parameters. |
| Alignment Handbook | ★ 5.6k | A set of recipes and scripts from Hugging Face showing how to run the full SFT-then-preference-alignment pipeline used to build aligned chat models. |
| Verifiers | ★ 4.2k | A library for defining verifiable-reward environments and running reinforcement-learning fine-tuning of LLMs against those rewards. |