AI/TLDR

Agent Lightning

Train almost any AI agent with reinforcement learning and prompt optimization, with little to no code change

Overview

Agent Lightning is an open-source training framework from Microsoft Research that helps you improve AI agents after you have already built them. Instead of rewriting your agent, you keep it running as usual and let the framework collect what it does: every prompt, tool call, and reward. Those events become structured traces that a learning algorithm can use to make the agent better.

It works with almost any agent framework, including LangChain, the OpenAI Agents SDK, AutoGen, CrewAI, and Microsoft Agent Framework, or with no framework at all if you just call an LLM directly from Python. The project says you can turn an agent into something trainable with zero code change in most cases, and you can choose to optimize one agent or several inside a larger multi-agent system.

Agent Lightning supports several improvement methods, including reinforcement learning, automatic prompt optimization, and supervised fine-tuning. A central store keeps tasks, resources, and traces in sync, and a trainer ties everything together so improvements such as refined prompts or new model weights flow back into the running agent.

What it does

  • Works with almost any agent stack: LangChain, OpenAI Agents SDK, AutoGen, CrewAI, Microsoft Agent Framework, or plain Python OpenAI calls
  • Adds training with little to no code change to your existing agent
  • Supports several methods: reinforcement learning, automatic prompt optimization, and supervised fine-tuning
  • Lets you selectively optimize one or more agents inside a multi-agent system
  • Captures prompts, tool calls, and rewards as structured traces through a central store (LightningStore)
  • Released by Microsoft under the MIT license, with documentation and runnable examples

Getting started

Agent Lightning is a Python package published on PyPI. Install it, then follow the documentation and bundled examples to wire your existing agent into the trainer.

Install from PyPI

Install the stable release with pip.

bashbash
pip install agentlightning

Or install the nightly build

If you want the latest cutting-edge features, install the pre-release build from Test PyPI.

bashbash
pip install --upgrade --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ --pre agentlightning

Connect your agent and train

Keep your agent running as it is. Drop in the lightweight agl.emit_xxx() helpers, or let the tracer collect each prompt, tool call, and reward automatically. Those events flow into the store, your chosen algorithm learns from them, and the trainer feeds improvements back to the agent. See the official documentation and the examples folder in the repository for complete, runnable setups.

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Improve an existing LangChain, CrewAI, or AutoGen agent with reinforcement learning without rewriting it
  • Train an agent to write and self-correct SQL queries using reward signals
  • Tune just one agent inside a larger multi-agent system while leaving the others unchanged
  • Apply automatic prompt optimization or supervised fine-tuning to a plain Python OpenAI agent

How Agent Lightning compares

Agent Lightning alongside other open-source rlhf & alignment tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Open-R1★ 26.3kAn open reproduction of the DeepSeek-R1 reasoning pipeline, with scripts for GRPO training and reasoning-data generation.
verl★ 22.1kVolcano Engine's RL post-training framework (HybridFlow) for building GRPO, PPO, and other RL pipelines on top of FSDP, Megatron, and vLLM.
TRL★ 18.7kHugging Face's post-training library with trainers for SFT, reward modeling, DPO, PPO, and GRPO to align language models with preferences.
Agent Lightning★ 17.3kTrain almost any AI agent with reinforcement learning and prompt optimization, with little to no code change
ART★ 10.1kOpenPipe's Agent Reinforcement Trainer for post-training LLM agents on multi-step tasks using GRPO and rule- or judge-based rewards.
OpenRLHF★ 9.7kA Ray- and vLLM-based RLHF framework that scales PPO, GRPO, and REINFORCE++ training to models with 70B+ parameters.
Alignment Handbook★ 5.6kA set of recipes and scripts from Hugging Face showing how to run the full SFT-then-preference-alignment pipeline used to build aligned chat models.
Verifiers★ 4.2kA library for defining verifiable-reward environments and running reinforcement-learning fine-tuning of LLMs against those rewards.