Overview
PEFT (Parameter-Efficient Fine-Tuning) is a Hugging Face library that adapts large pretrained models by training only a small number of extra parameters instead of every weight in the model. This cuts the compute and storage needed for fine-tuning, while reaching quality close to a fully fine-tuned model.
It is aimed at developers and ML engineers who want to specialize a base model for a downstream task but do not have the hardware to fine-tune billions of parameters. With methods like LoRA, you can train a 3B-parameter model on a single GPU and save an adapter that is only a few megabytes instead of gigabytes.
As a fine-tuning framework, PEFT plugs into the rest of the Hugging Face stack: it works with Transformers for training and inference, Diffusers for managing adapters, and Accelerate for distributed runs across larger models.
What it does
- Supports several PEFT methods including LoRA, DoRA, prompt tuning, soft prompts, and IA3
- Wraps any base model with get_peft_model so existing training loops keep working
- Saves small adapter checkpoints (e.g. ~19MB) rather than full multi-GB model copies
- Integrates with Transformers, Diffusers, and Accelerate out of the box
- Combines with quantization (such as QLoRA) to train large models on consumer GPUs
- Loads trained adapters for inference with PeftModel.from_pretrained
Getting started
Install PEFT, wrap a base model with a LoRA config to train an adapter, then reload that adapter for inference.
Install PEFT
Install the library from PyPI with pip.
pip install peftWrap a model with a PEFT method
Load a base model and wrap it with a LoRA config using get_peft_model, then train and save the adapter. print_trainable_parameters shows how few parameters you are actually training.
from transformers import AutoModelForCausalLM
from peft import LoraConfig, TaskType, get_peft_model
device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
model_id = "Qwen/Qwen2.5-3B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device)
peft_config = LoraConfig(
r=16,
lora_alpha=32,
task_type=TaskType.CAUSAL_LM,
)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
# train on your dataset, then save the adapter
model.save_pretrained("qwen2.5-3b-lora")Load the adapter for inference
Reload the base model and apply your saved adapter with PeftModel.from_pretrained, then generate as usual.
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
model_id = "Qwen/Qwen2.5-3B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device)
model = PeftModel.from_pretrained(model, "qwen2.5-3b-lora")
inputs = tokenizer("Preheat the oven to 350 degrees and place the cookie dough", return_tensors="pt")
outputs = model.generate(**inputs.to(device), max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Fine-tune a large language model for a specific task on a single consumer or cloud GPU instead of needing a full multi-GPU setup
- Train and ship many small task-specific adapters from one shared base model to save storage
- Combine LoRA with quantization (QLoRA) to fit and train a 7B-12B model that would otherwise run out of memory
- Adapt Diffusers image models or Whisper-style speech models using lightweight adapters
How PEFT compares
PEFT alongside other open-source fine-tuning frameworks tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| LLaMA-Factory | ★ 72.3k | An end-to-end training suite with a web UI that covers pre-training, supervised fine-tuning, and RLHF for hundreds of LLMs and multimodal models. |
| Unsloth | ★ 66.9k | A library that speeds up LoRA and QLoRA fine-tuning while cutting memory use, aimed at training models on a single GPU. |
| PEFT | ★ 21.3k | Fine-tune large models by training tiny adapters instead of all the weights |
| FinGPT | ★ 20.5k | FinGPT is an open-source project of financial LLMs, fine-tuned with LoRA on news and tweet data for tasks like sentiment analysis, relation extraction, and stock-move forecasting. |
| ms-swift | ★ 14.6k | ModelScope's framework for fine-tuning and deploying 600+ LLMs and 300+ multimodal models, supporting PEFT and full-parameter SFT, DPO, and GRPO. |
| LitGPT | ★ 13.4k | An open-source toolkit from Lightning AI to pretrain, finetune, and serve 20+ large language models, each written from scratch for speed and full control. |
| Axolotl | ★ 12.1k | A config-driven tool for fine-tuning and post-training open LLMs that supports SFT, LoRA/QLoRA, DPO, GRPO, and multi-GPU training across many model families. |
| Ludwig | ★ 11.7k | Ludwig is a low-code framework that lets you train, fine-tune, and deploy LLMs, multimodal, and tabular models using a YAML config instead of boilerplate Python. |