Verifiers

Build verifiable-reward environments to train and evaluate LLMs with reinforcement learning

github.com/PrimeIntellect-ai/verifiers★ 4.2k

Overview

Verifiers is a Python library from Prime Intellect for creating environments that train and evaluate large language models. An environment bundles everything a task needs: a dataset of inputs, a harness that gives the model its tools and context, and a rubric (reward function) that scores how well the model did.

It is aimed at researchers and engineers working on reinforcement-learning fine-tuning and alignment. Because the reward is defined as code you control, the same environment can be reused for RL training, capability evaluation, synthetic-data generation, and experimenting with agent harnesses.

Within the RLHF and alignment space, Verifiers focuses on the environment and reward side rather than the trainer. It integrates with Prime Intellect's Environments Hub and the prime-rl training framework, and environments are self-contained Python modules you can share and run.

What it does

Environments bundle a dataset, a model harness (tools, sandboxes, context management), and a rubric reward function
Environments are self-contained Python modules that expose a load_environment function
Works for RL training, evaluation, synthetic-data generation, and agent-harness experiments
prime CLI scaffolds workspaces and environment templates (prime lab setup, prime env init)
Integrates with the Environments Hub and the prime-rl training framework
Includes a nano RL trainer (vf.RLTrainer) and multi-turn environment support

Getting started

Verifiers uses uv for dependencies and the prime CLI to set up workspaces and scaffold environments.

Install uv and the prime CLI

Install the uv package manager, then install and log in to the prime CLI.

bashbash

# install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# install the prime CLI
uv tool install prime
# log in to the Prime Intellect platform
prime login

Set up a workspace

Create a workspace, which sets up a uv project and installs verifiers. To add verifiers to an existing project instead, run uv add verifiers && prime lab setup --skip-install.

bashbash

prime lab setup

Initialize an environment template

Scaffold a new self-contained environment module under environments/.

bashbash

prime env init my-env

Define load_environment

Each environment module exposes a load_environment function that returns an environment object.

pythonpython

# my_env.py
import verifiers as vf

def load_environment(dataset_name: str = 'gsm8k') -> vf.Environment:
    dataset = vf.load_example_dataset(dataset_name)
    ...

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Fine-tune an LLM with reinforcement learning against a custom, code-defined reward
Build a reusable benchmark to evaluate model capabilities on a specific task
Generate synthetic training data by running models through a task environment
Prototype and test agent harnesses with tools and sandboxes before training

How Verifiers compares

Verifiers alongside other open-source rlhf & alignment tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
Open-R1	★ 26.3k	An open reproduction of the DeepSeek-R1 reasoning pipeline, with scripts for GRPO training and reasoning-data generation.
verl	★ 22.1k	Volcano Engine's RL post-training framework (HybridFlow) for building GRPO, PPO, and other RL pipelines on top of FSDP, Megatron, and vLLM.
TRL	★ 18.7k	Hugging Face's post-training library with trainers for SFT, reward modeling, DPO, PPO, and GRPO to align language models with preferences.
Agent Lightning	★ 17.3k	An open-source trainer from Microsoft that improves AI agents built with any framework using reinforcement learning, prompt optimization, and supervised fine-tuning.
ART	★ 10.1k	OpenPipe's Agent Reinforcement Trainer for post-training LLM agents on multi-step tasks using GRPO and rule- or judge-based rewards.
OpenRLHF	★ 9.7k	A Ray- and vLLM-based RLHF framework that scales PPO, GRPO, and REINFORCE++ training to models with 70B+ parameters.
Alignment Handbook	★ 5.6k	A set of recipes and scripts from Hugging Face showing how to run the full SFT-then-preference-alignment pipeline used to build aligned chat models.
Verifiers	★ 4.2k	Build verifiable-reward environments to train and evaluate LLMs with reinforcement learning

// Overview

// What it does

// Getting started

Install uv and the prime CLI

Set up a workspace

Initialize an environment template

Define load_environment

// When to use it

// How Verifiers compares

Overview

What it does

Getting started

When to use it

How Verifiers compares