AI/TLDR

MMDetection

PyTorch object detection and segmentation toolbox with a large model zoo

Overview

MMDetection is an open-source object detection toolbox built on PyTorch and is part of the OpenMMLab project. It breaks the detection pipeline into separate modules, so you can mix and match backbones, necks, heads, and training settings to build a custom model without rewriting the whole framework.

It is aimed at computer vision researchers and engineers who need a tested base for detection and segmentation work. Out of the box it covers object detection, instance segmentation, panoptic segmentation, and semi-supervised object detection, and it ships a large model zoo of pre-trained weights you can run or fine-tune.

Within the computer vision space, MMDetection sits alongside frameworks like Detectron2 as a config-driven training and inference library. It relies on two companion OpenMMLab packages, MMEngine for training and MMCV for vision operations, which you install before MMDetection itself.

What it does

  • Modular design that lets you assemble a custom detector by combining backbone, neck, and head components
  • Supports object detection, instance segmentation, panoptic segmentation, and semi-supervised object detection out of the box
  • Large model zoo of pre-trained configs and weights, including RTMDet and MM-Grounding-DINO
  • Core bbox and mask operations run on GPU for fast training and inference
  • Config-based workflow for reproducible training and testing on standard datasets like COCO
  • Works with PyTorch 1.8+ and integrates with the OpenMMLab MMEngine and MMCV packages

Getting started

Install PyTorch, then add the OpenMMLab dependencies with mim before installing MMDetection. The example below downloads a small RTMDet model and runs inference on a demo image.

Install PyTorch

Use conda to install PyTorch and torchvision. Use the GPU build if you have CUDA, or the CPU-only build otherwise.

bashbash
conda install pytorch torchvision -c pytorch

Install MMEngine and MMCV with mim

MMDetection depends on these two OpenMMLab packages. The mim tool resolves the matching versions for you.

bashbash
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0"

Install MMDetection

Install the published package with mim, or clone the repo and install in editable mode for development.

bashbash
mim install mmdet

Download a model and run inference

Fetch a pre-trained RTMDet model, then run detection on a demo image with the Python API.

pythonpython
mim download mmdet --config rtmdet_tiny_8xb32-300e_coco --dest .

from mmdet.apis import init_detector, inference_detector

config_file = 'rtmdet_tiny_8xb32-300e_coco.py'
checkpoint_file = 'rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth'
model = init_detector(config_file, checkpoint_file, device='cpu')
inference_detector(model, 'demo/demo.jpg')

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

  • Run a pre-trained detector from the model zoo to get bounding boxes or masks on your own images
  • Fine-tune an existing detection or instance segmentation model on a custom dataset
  • Build and benchmark a new detection architecture by swapping modular components in a config
  • Reproduce or compare published detection and segmentation results on standard datasets like COCO

How MMDetection compares

MMDetection alongside other open-source vision & understanding tools AI/TLDR tracks, ranked by GitHub stars.

ToolStarsWhat it does
PaddleOCR★ 83.1kA toolkit for detecting and recognizing text in images across many languages, plus document parsing.
Ultralytics YOLO★ 58.6kA framework for training and running YOLO models for real-time object detection, segmentation, and tracking.
Supervision★ 44.7kA Python toolkit for processing, annotating, and visualizing detections and segmentations from many vision models.
MMDetection★ 32.8kPyTorch object detection and segmentation toolbox with a large model zoo
Segment Anything 2 (SAM 2)★ 19.4kMeta's model for segmenting and tracking any object across images and video frames from clicks or boxes.
Grounded-SAM★ 17.6kA pipeline that combines Grounding DINO and Segment Anything to detect and segment objects from text prompts.
DINOv3★ 10.7kMeta's self-supervised vision backbone that produces general-purpose image features for many downstream tasks.
Segment Anything 3 (SAM 3)★ 10.6kMeta's segmentation model that detects, segments, and tracks objects in images and video from text or visual prompts.