AI/TLDR

Verifiers

Build verifiable-reward environments to train and evaluate LLMs with reinforcement learning

Overview

Verifiers is a Python library from Prime Intellect for creating environments that train and evaluate large language models. An environment bundles everything a task needs: a dataset of inputs, a harness that gives the model its tools and context, and a rubric (reward function) that scores how well the model did.

It is aimed at researchers and engineers working on reinforcement-learning fine-tuning and alignment. Because the reward is defined as code you control, the same environment can be reused for RL training, capability evaluation, synthetic-data generation, and experimenting with agent harnesses.

Within the RLHF and alignment space, Verifiers focuses on the environment and reward side rather than the trainer. It integrates with Prime Intellect's Environments Hub and the prime-rl training framework, and environments are self-contained Python modules you can share and run.

What it does

  • Environments bundle a dataset, a model harness (tools, sandboxes, context management), and a rubric reward function
  • Environments are self-contained Python modules that expose a load_environment function
  • Works for RL training, evaluation, synthetic-data generation, and agent-harness experiments
  • prime CLI scaffolds workspaces and environment templates (prime lab setup, prime env init)
  • Integrates with the Environments Hub and the prime-rl training framework
  • Includes a nano RL trainer (vf.RLTrainer) and multi-turn environment support

Getting started

Verifiers uses uv for dependencies and the prime CLI to set up workspaces and scaffold environments.

Install uv and the prime CLI

Install the uv package manager, then install and log in to the prime CLI.

bashbash
# install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# install the prime CLI
uv tool install prime
# log in to the Prime Intellect platform
prime login

Set up a workspace

Create a workspace, which sets up a uv project and installs verifiers. To add verifiers to an existing project instead, run uv add verifiers && prime lab setup --skip-install.

bashbash
prime lab setup

Initialize an environment template

Scaffold a new self-contained environment module under environments/.

bashbash
prime env init my-env

Define load_environment

Each environment module exposes a load_environment function that returns an environment object.

pythonpython
# my_env.py
import verifiers as vf

def load_environment(dataset_name: str = 'gsm8k') -> vf.Environment:
    dataset = vf.load_example_dataset(dataset_name)
    ...

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Fine-tune an LLM with reinforcement learning against a custom, code-defined reward
  • Build a reusable benchmark to evaluate model capabilities on a specific task
  • Generate synthetic training data by running models through a task environment
  • Prototype and test agent harnesses with tools and sandboxes before training

How Verifiers compares

Verifiers alongside other open-source rlhf & alignment tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
Open-R1★ 26.3kAn open reproduction of the DeepSeek-R1 reasoning pipeline, with scripts for GRPO training and reasoning-data generation.
verl★ 22.1kVolcano Engine's RL post-training framework (HybridFlow) for building GRPO, PPO, and other RL pipelines on top of FSDP, Megatron, and vLLM.
TRL★ 18.7kHugging Face's post-training library with trainers for SFT, reward modeling, DPO, PPO, and GRPO to align language models with preferences.
Agent Lightning★ 17.3kAn open-source trainer from Microsoft that improves AI agents built with any framework using reinforcement learning, prompt optimization, and supervised fine-tuning.
ART★ 10.1kOpenPipe's Agent Reinforcement Trainer for post-training LLM agents on multi-step tasks using GRPO and rule- or judge-based rewards.
OpenRLHF★ 9.7kA Ray- and vLLM-based RLHF framework that scales PPO, GRPO, and REINFORCE++ training to models with 70B+ parameters.
Alignment Handbook★ 5.6kA set of recipes and scripts from Hugging Face showing how to run the full SFT-then-preference-alignment pipeline used to build aligned chat models.
Verifiers★ 4.2kBuild verifiable-reward environments to train and evaluate LLMs with reinforcement learning