AI/TLDR

XTuner

Train and fine-tune large language and vision-language models, including ultra-large MoE models

Overview

XTuner is an open-source toolkit from the InternLM team for fine-tuning and training large language models and vision-language models. It supports supervised fine-tuning, multimodal pre-training, and reinforcement learning methods such as GRPO, and runs on both NVIDIA GPUs and Ascend NPUs.

XTuner V1, released in September 2025, is a training engine designed for ultra-large Mixture-of-Experts (MoE) models. It can train models in the hundreds of billions of parameters without the complexity of traditional 3D parallel setups, and supports long sequence lengths through memory-efficient design and DeepSpeed Ulysses sequence parallelism.

It fits in the fine-tuning frameworks space for teams who need to scale training beyond what fits on a single setup. It works with model families like Qwen3, InternVL, InternS1, DeepSeek V3, and integrates with inference engines such as LMDeploy for deployment.

What it does

  • Fine-tunes LLMs and vision-language models, including multimodal pre-training and supervised fine-tuning
  • Dropless MoE training that scales to 200B models without expert parallelism and up to 1T parameters
  • Long-sequence support, including 200B MoE models at 64k context, with DeepSpeed Ulysses sequence parallelism
  • Reinforcement learning with GRPO, plus MPO, DAPO, and multi-turn agentic RL on the roadmap
  • Runs on NVIDIA GPUs (FP8/BF16) and Ascend NPUs (BF16) across supported model families
  • Inference integration with LMDeploy, with vLLM and SGLang support planned

Getting started

Install XTuner from source, then run a small SFT job to confirm the setup works. A recent NVIDIA driver (greater than 550.127.08) is recommended.

Clone and install

Clone the repository and install it in editable mode with pip.

bashbash
git clone https://github.com/InternLM/xtuner.git
cd xtuner
pip install -e .

Run a minimal SFT job

Use torchrun with the V1 SFT entry point and an example config to verify training runs end to end.

bashbash
torchrun xtuner/v1/train/cli/sft.py --model-cfg examples/v1/sft_qwen3_tiny.py \
  --chat_template qwen3 --dataset tests/resource/openai_sft.jsonl

Optional speed and MoE extras

For faster training install flash-attn (or flash-attn-3), and install GroupedGEMM when training MoE models. These are optional add-ons, not required for the basic install.

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Fine-tuning large MoE models (hundreds of billions of parameters) without setting up traditional 3D parallel training
  • Supervised fine-tuning of LLMs such as Qwen3 on instruction datasets
  • Multimodal pre-training and fine-tuning of vision-language models like InternVL
  • Running reinforcement learning (GRPO) on large models, with deployment through LMDeploy

How XTuner compares

XTuner alongside other open-source fine-tuning frameworks tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
LLaMA-Factory★ 72.3kAn end-to-end training suite with a web UI that covers pre-training, supervised fine-tuning, and RLHF for hundreds of LLMs and multimodal models.
Unsloth★ 66.9kA library that speeds up LoRA and QLoRA fine-tuning while cutting memory use, aimed at training models on a single GPU.
PEFT★ 21.3kHugging Face's library of parameter-efficient fine-tuning methods such as LoRA, DoRA, and prompt tuning that train small adapters instead of full models.
FinGPT★ 20.5kFinGPT is an open-source project of financial LLMs, fine-tuned with LoRA on news and tweet data for tasks like sentiment analysis, relation extraction, and stock-move forecasting.
ms-swift★ 14.6kModelScope's framework for fine-tuning and deploying 600+ LLMs and 300+ multimodal models, supporting PEFT and full-parameter SFT, DPO, and GRPO.
LitGPT★ 13.4kAn open-source toolkit from Lightning AI to pretrain, finetune, and serve 20+ large language models, each written from scratch for speed and full control.
Axolotl★ 12.1kA config-driven tool for fine-tuning and post-training open LLMs that supports SFT, LoRA/QLoRA, DPO, GRPO, and multi-GPU training across many model families.
XTuner★ 5.2kTrain and fine-tune large language and vision-language models, including ultra-large MoE models