AI/TLDR

Hugging Face · 2026-03-31 · notable

TRL v1.0 — Hugging Face Post-Training Library Reaches Stable

First stable release of the dominant LLM post-training library. Ships 75+ alignment methods (SFT, DPO, GRPO, KTO), weekly releases, Unsloth integration for 2x training speed, and 3M monthly PyPI downloads.

TRL v1.0 release banner with Hugging Face branding

The go-to library for aligning LLMs hits 1.0 with stable APIs and 75+ post-training methods.

Key specs

LicenseApache 2.0
GitHub stars18k
Py pi downloads/month3M
Alignment methods75+

What is it?

TRL (Transformer Reinforcement Learning) is Hugging Face's library for post-training language models. The v1.0 release marks the transition from research-oriented tool to production-ready framework with stable APIs, semantic versioning, and a clear separation between stable and experimental methods.

How does it work?

TRL provides trainers for supervised fine-tuning (SFT), reward modeling, and multiple alignment algorithms including DPO, GRPO, RLOO, and KTO. It integrates with Accelerate for multi-GPU/multi-node scaling and with Unsloth for up to 2x training speedup and 70% memory reduction. Starting with v1.0, minor releases ship weekly so new model support lands fast.

Why does it matter?

Post-training is the step that turns a pretrained model into something useful for a specific task. TRL 1.0 makes this step reliable enough for production workflows, with stable APIs that teams can build CI/CD pipelines around. At 3M monthly downloads, it is the dominant library in this space.

Who is it for?

ML engineers fine-tuning or aligning LLMs.

Try it

pip install --upgrade trl

Sources · 2 outlets

Tags

  • post-training
  • rlhf
  • dpo
  • grpo
  • fine-tuning
  • alignment
  • huggingface

← All releases · Learn AI