PEFT

Fine-tune large models by training tiny adapters instead of all the weights

github.com/huggingface/peft★ 21.3k huggingface.co/docs/peft

Overview

PEFT (Parameter-Efficient Fine-Tuning) is a Hugging Face library that adapts large pretrained models by training only a small number of extra parameters instead of every weight in the model. This cuts the compute and storage needed for fine-tuning, while reaching quality close to a fully fine-tuned model.

It is aimed at developers and ML engineers who want to specialize a base model for a downstream task but do not have the hardware to fine-tune billions of parameters. With methods like LoRA, you can train a 3B-parameter model on a single GPU and save an adapter that is only a few megabytes instead of gigabytes.

As a fine-tuning framework, PEFT plugs into the rest of the Hugging Face stack: it works with Transformers for training and inference, Diffusers for managing adapters, and Accelerate for distributed runs across larger models.

What it does

Supports several PEFT methods including LoRA, DoRA, prompt tuning, soft prompts, and IA3
Wraps any base model with get_peft_model so existing training loops keep working
Saves small adapter checkpoints (e.g. ~19MB) rather than full multi-GB model copies
Integrates with Transformers, Diffusers, and Accelerate out of the box
Combines with quantization (such as QLoRA) to train large models on consumer GPUs
Loads trained adapters for inference with PeftModel.from_pretrained

Getting started

Install PEFT, wrap a base model with a LoRA config to train an adapter, then reload that adapter for inference.

Install PEFT

Install the library from PyPI with pip.

bashbash

pip install peft

Wrap a model with a PEFT method

Load a base model and wrap it with a LoRA config using get_peft_model, then train and save the adapter. print_trainable_parameters shows how few parameters you are actually training.

pythonpython

from transformers import AutoModelForCausalLM
from peft import LoraConfig, TaskType, get_peft_model

device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
model_id = "Qwen/Qwen2.5-3B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device)
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    task_type=TaskType.CAUSAL_LM,
)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

# train on your dataset, then save the adapter
model.save_pretrained("qwen2.5-3b-lora")

Load the adapter for inference

Reload the base model and apply your saved adapter with PeftModel.from_pretrained, then generate as usual.

pythonpython

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
model_id = "Qwen/Qwen2.5-3B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device)
model = PeftModel.from_pretrained(model, "qwen2.5-3b-lora")

inputs = tokenizer("Preheat the oven to 350 degrees and place the cookie dough", return_tensors="pt")
outputs = model.generate(**inputs.to(device), max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Fine-tune a large language model for a specific task on a single consumer or cloud GPU instead of needing a full multi-GPU setup
Train and ship many small task-specific adapters from one shared base model to save storage
Combine LoRA with quantization (QLoRA) to fit and train a 7B-12B model that would otherwise run out of memory
Adapt Diffusers image models or Whisper-style speech models using lightweight adapters

How PEFT compares

PEFT alongside other open-source fine-tuning frameworks tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
LLaMA-Factory	★ 72.3k	An end-to-end training suite with a web UI that covers pre-training, supervised fine-tuning, and RLHF for hundreds of LLMs and multimodal models.
Unsloth	★ 66.9k	A library that speeds up LoRA and QLoRA fine-tuning while cutting memory use, aimed at training models on a single GPU.
PEFT	★ 21.3k	Fine-tune large models by training tiny adapters instead of all the weights
FinGPT	★ 20.5k	FinGPT is an open-source project of financial LLMs, fine-tuned with LoRA on news and tweet data for tasks like sentiment analysis, relation extraction, and stock-move forecasting.
ms-swift	★ 14.6k	ModelScope's framework for fine-tuning and deploying 600+ LLMs and 300+ multimodal models, supporting PEFT and full-parameter SFT, DPO, and GRPO.
LitGPT	★ 13.4k	An open-source toolkit from Lightning AI to pretrain, finetune, and serve 20+ large language models, each written from scratch for speed and full control.
Axolotl	★ 12.1k	A config-driven tool for fine-tuning and post-training open LLMs that supports SFT, LoRA/QLoRA, DPO, GRPO, and multi-GPU training across many model families.
Ludwig	★ 11.7k	Ludwig is a low-code framework that lets you train, fine-tune, and deploy LLMs, multimodal, and tabular models using a YAML config instead of boilerplate Python.

// Overview

// What it does

// Getting started