AI/TLDR

Alignment Handbook

Hugging Face recipes and scripts for the full SFT-then-preference-alignment pipeline

Overview

The Alignment Handbook is a collection of training recipes and scripts from Hugging Face for building aligned chat models. It walks through the whole pipeline used to produce models like Zephyr-7B: continued pretraining, supervised fine-tuning (SFT) for chat, and preference alignment with DPO or ORPO.

It is aimed at ML engineers and researchers who want to reproduce known aligned models or train their own on custom datasets. Each training run is described by a single YAML recipe file that holds all the parameters, so you change configs rather than rewriting code.

Within the fine-tuning and RLHF/alignment space, it sits one level above raw training libraries: the scripts are thin wrappers over the Hugging Face stack, and they support full-weight distributed training with DeepSpeed ZeRO-3 as well as parameter-efficient LoRA/QLoRA.

What it does

  • Covers the full pipeline: continued pretraining, SFT, reward modeling, rejection sampling, DPO, and ORPO
  • Reproducible recipes as YAML files (e.g. Zephyr-7B, SmolLM, StarChat2) that hold every parameter for a run
  • Ready-made scripts for SFT (sft.py) and preference alignment (dpo.py) launched with accelerate
  • Supports full-weight distributed training via DeepSpeed ZeRO-3, plus LoRA/QLoRA for parameter-efficient tuning
  • Instructions and formatting guidance for fine-tuning chat models on your own datasets
  • Built on the Hugging Face ecosystem with documented dataset and model collections

Getting started

Set up a Python environment with the pinned dependencies, then launch a training run from one of the provided recipe YAML files. The example below reproduces the Zephyr-7B-beta pipeline.

Create a virtual environment and install dependencies

Use uv to create an environment, install the pinned PyTorch build, then install the handbook package and Flash Attention 2.

bashbash
uv venv handbook --python 3.11 && source handbook/bin/activate && uv pip install --upgrade pip
uv pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu126
uv pip install .
uv pip install "flash-attn==2.7.4.post1" --no-build-isolation

Log in to Hugging Face

Authenticate so you can pull base models and datasets and push your trained model.

bashbash
huggingface-cli login

Run supervised fine-tuning (SFT)

Launch the SFT script with a recipe config, using the ZeRO-3 accelerate config for distributed training.

bashbash
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero3.yaml scripts/sft.py --config recipes/zephyr-7b-beta/sft/config_full.yaml

Align with DPO

Take the SFT model and align it to preferences using direct preference optimization.

bashbash
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero3.yaml scripts/dpo.py --config recipes/zephyr-7b-beta/dpo/config_full.yaml

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Reproduce a published aligned chat model such as Zephyr-7B from its recipe
  • Fine-tune and align an open base model on your own instruction and preference datasets
  • Compare alignment methods (DPO vs. KTO vs. IPO, or ORPO) using the included recipes
  • Adapt a model to a new language or domain through continued pretraining, then SFT and DPO

How Alignment Handbook compares

Alignment Handbook alongside other open-source rlhf & alignment tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Open-R1★ 26.3kAn open reproduction of the DeepSeek-R1 reasoning pipeline, with scripts for GRPO training and reasoning-data generation.
verl★ 22.1kVolcano Engine's RL post-training framework (HybridFlow) for building GRPO, PPO, and other RL pipelines on top of FSDP, Megatron, and vLLM.
TRL★ 18.7kHugging Face's post-training library with trainers for SFT, reward modeling, DPO, PPO, and GRPO to align language models with preferences.
Agent Lightning★ 17.3kAn open-source trainer from Microsoft that improves AI agents built with any framework using reinforcement learning, prompt optimization, and supervised fine-tuning.
ART★ 10.1kOpenPipe's Agent Reinforcement Trainer for post-training LLM agents on multi-step tasks using GRPO and rule- or judge-based rewards.
OpenRLHF★ 9.7kA Ray- and vLLM-based RLHF framework that scales PPO, GRPO, and REINFORCE++ training to models with 70B+ parameters.
Alignment Handbook★ 5.6kHugging Face recipes and scripts for the full SFT-then-preference-alignment pipeline
Verifiers★ 4.2kA library for defining verifiable-reward environments and running reinforcement-learning fine-tuning of LLMs against those rewards.