Hugging Face · 2026-03-31 · notable
TRL v1.0 — Hugging Face Post-Training Library Reaches Stable
First stable release of the dominant LLM post-training library. Ships 75+ alignment methods (SFT, DPO, GRPO, KTO), weekly releases, Unsloth integration for 2x training speed, and 3M monthly PyPI downloads.

The go-to library for aligning LLMs hits 1.0 with stable APIs and 75+ post-training methods.
Key specs
| License | Apache 2.0 |
|---|---|
| GitHub stars | 18k |
| Py pi downloads/month | 3M |
| Alignment methods | 75+ |
What is it?
TRL (Transformer Reinforcement Learning) is Hugging Face's library for post-training language models. The v1.0 release marks the transition from research-oriented tool to production-ready framework with stable APIs, semantic versioning, and a clear separation between stable and experimental methods.
How does it work?
TRL provides trainers for supervised fine-tuning (SFT), reward modeling, and multiple alignment algorithms including DPO, GRPO, RLOO, and KTO. It integrates with Accelerate for multi-GPU/multi-node scaling and with Unsloth for up to 2x training speedup and 70% memory reduction. Starting with v1.0, minor releases ship weekly so new model support lands fast.
Why does it matter?
Post-training is the step that turns a pretrained model into something useful for a specific task. TRL 1.0 makes this step reliable enough for production workflows, with stable APIs that teams can build CI/CD pipelines around. At 3M monthly downloads, it is the dominant library in this space.
Who is it for?
ML engineers fine-tuning or aligning LLMs.
Try it
pip install --upgrade trl