AI/TLDR

PEFT

Fine-tune large models by training tiny adapters instead of all the weights

Overview

PEFT (Parameter-Efficient Fine-Tuning) is a Hugging Face library that adapts large pretrained models by training only a small number of extra parameters instead of every weight in the model. This cuts the compute and storage needed for fine-tuning, while reaching quality close to a fully fine-tuned model.

It is aimed at developers and ML engineers who want to specialize a base model for a downstream task but do not have the hardware to fine-tune billions of parameters. With methods like LoRA, you can train a 3B-parameter model on a single GPU and save an adapter that is only a few megabytes instead of gigabytes.

As a fine-tuning framework, PEFT plugs into the rest of the Hugging Face stack: it works with Transformers for training and inference, Diffusers for managing adapters, and Accelerate for distributed runs across larger models.

What it does

  • Supports several PEFT methods including LoRA, DoRA, prompt tuning, soft prompts, and IA3
  • Wraps any base model with get_peft_model so existing training loops keep working
  • Saves small adapter checkpoints (e.g. ~19MB) rather than full multi-GB model copies
  • Integrates with Transformers, Diffusers, and Accelerate out of the box
  • Combines with quantization (such as QLoRA) to train large models on consumer GPUs
  • Loads trained adapters for inference with PeftModel.from_pretrained

Getting started

Install PEFT, wrap a base model with a LoRA config to train an adapter, then reload that adapter for inference.

Install PEFT

Install the library from PyPI with pip.

bashbash
pip install peft

Wrap a model with a PEFT method

Load a base model and wrap it with a LoRA config using get_peft_model, then train and save the adapter. print_trainable_parameters shows how few parameters you are actually training.

pythonpython
from transformers import AutoModelForCausalLM
from peft import LoraConfig, TaskType, get_peft_model

device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
model_id = "Qwen/Qwen2.5-3B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device)
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    task_type=TaskType.CAUSAL_LM,
)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

# train on your dataset, then save the adapter
model.save_pretrained("qwen2.5-3b-lora")

Load the adapter for inference

Reload the base model and apply your saved adapter with PeftModel.from_pretrained, then generate as usual.

pythonpython
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
model_id = "Qwen/Qwen2.5-3B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device)
model = PeftModel.from_pretrained(model, "qwen2.5-3b-lora")

inputs = tokenizer("Preheat the oven to 350 degrees and place the cookie dough", return_tensors="pt")
outputs = model.generate(**inputs.to(device), max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Fine-tune a large language model for a specific task on a single consumer or cloud GPU instead of needing a full multi-GPU setup
  • Train and ship many small task-specific adapters from one shared base model to save storage
  • Combine LoRA with quantization (QLoRA) to fit and train a 7B-12B model that would otherwise run out of memory
  • Adapt Diffusers image models or Whisper-style speech models using lightweight adapters

How PEFT compares

PEFT alongside other open-source fine-tuning frameworks tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
LLaMA-Factory★ 72.3kAn end-to-end training suite with a web UI that covers pre-training, supervised fine-tuning, and RLHF for hundreds of LLMs and multimodal models.
Unsloth★ 66.9kA library that speeds up LoRA and QLoRA fine-tuning while cutting memory use, aimed at training models on a single GPU.
PEFT★ 21.3kFine-tune large models by training tiny adapters instead of all the weights
FinGPT★ 20.5kFinGPT is an open-source project of financial LLMs, fine-tuned with LoRA on news and tweet data for tasks like sentiment analysis, relation extraction, and stock-move forecasting.
ms-swift★ 14.6kModelScope's framework for fine-tuning and deploying 600+ LLMs and 300+ multimodal models, supporting PEFT and full-parameter SFT, DPO, and GRPO.
LitGPT★ 13.4kAn open-source toolkit from Lightning AI to pretrain, finetune, and serve 20+ large language models, each written from scratch for speed and full control.
Axolotl★ 12.1kA config-driven tool for fine-tuning and post-training open LLMs that supports SFT, LoRA/QLoRA, DPO, GRPO, and multi-GPU training across many model families.
Ludwig★ 11.7kLudwig is a low-code framework that lets you train, fine-tune, and deploy LLMs, multimodal, and tabular models using a YAML config instead of boilerplate Python.