MMDetection

PyTorch object detection and segmentation toolbox with a large model zoo

github.com/open-mmlab/mmdetection★ 32.8k mmdetection.readthedocs.io

Overview

MMDetection is an open-source object detection toolbox built on PyTorch and is part of the OpenMMLab project. It breaks the detection pipeline into separate modules, so you can mix and match backbones, necks, heads, and training settings to build a custom model without rewriting the whole framework.

It is aimed at computer vision researchers and engineers who need a tested base for detection and segmentation work. Out of the box it covers object detection, instance segmentation, panoptic segmentation, and semi-supervised object detection, and it ships a large model zoo of pre-trained weights you can run or fine-tune.

Within the computer vision space, MMDetection sits alongside frameworks like Detectron2 as a config-driven training and inference library. It relies on two companion OpenMMLab packages, MMEngine for training and MMCV for vision operations, which you install before MMDetection itself.

What it does

Modular design that lets you assemble a custom detector by combining backbone, neck, and head components
Supports object detection, instance segmentation, panoptic segmentation, and semi-supervised object detection out of the box
Large model zoo of pre-trained configs and weights, including RTMDet and MM-Grounding-DINO
Core bbox and mask operations run on GPU for fast training and inference
Config-based workflow for reproducible training and testing on standard datasets like COCO
Works with PyTorch 1.8+ and integrates with the OpenMMLab MMEngine and MMCV packages

Getting started

Install PyTorch, then add the OpenMMLab dependencies with mim before installing MMDetection. The example below downloads a small RTMDet model and runs inference on a demo image.

Install PyTorch

Use conda to install PyTorch and torchvision. Use the GPU build if you have CUDA, or the CPU-only build otherwise.

bashbash

conda install pytorch torchvision -c pytorch

Install MMEngine and MMCV with mim

MMDetection depends on these two OpenMMLab packages. The mim tool resolves the matching versions for you.

bashbash

pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0"

Install MMDetection

Install the published package with mim, or clone the repo and install in editable mode for development.

bashbash

mim install mmdet

Download a model and run inference

Fetch a pre-trained RTMDet model, then run detection on a demo image with the Python API.

pythonpython

mim download mmdet --config rtmdet_tiny_8xb32-300e_coco --dest .

from mmdet.apis import init_detector, inference_detector

config_file = 'rtmdet_tiny_8xb32-300e_coco.py'
checkpoint_file = 'rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth'
model = init_detector(config_file, checkpoint_file, device='cpu')
inference_detector(model, 'demo/demo.jpg')

Commands and code are distilled from the project's own documentation — always check the official repo for the latest.

When to use it

Run a pre-trained detector from the model zoo to get bounding boxes or masks on your own images
Fine-tune an existing detection or instance segmentation model on a custom dataset
Build and benchmark a new detection architecture by swapping modular components in a config
Reproduce or compare published detection and segmentation results on standard datasets like COCO

How MMDetection compares

MMDetection alongside other open-source vision & understanding tools AI/TLDR tracks, ranked by GitHub stars.

Tool	Stars	What it does
PaddleOCR	★ 83.1k	A toolkit for detecting and recognizing text in images across many languages, plus document parsing.
Ultralytics YOLO	★ 58.6k	A framework for training and running YOLO models for real-time object detection, segmentation, and tracking.
Supervision	★ 44.7k	A Python toolkit for processing, annotating, and visualizing detections and segmentations from many vision models.
MMDetection	★ 32.8k	PyTorch object detection and segmentation toolbox with a large model zoo
Segment Anything 2 (SAM 2)	★ 19.4k	Meta's model for segmenting and tracking any object across images and video frames from clicks or boxes.
Grounded-SAM	★ 17.6k	A pipeline that combines Grounding DINO and Segment Anything to detect and segment objects from text prompts.
DINOv3	★ 10.7k	Meta's self-supervised vision backbone that produces general-purpose image features for many downstream tasks.
Segment Anything 3 (SAM 3)	★ 10.6k	Meta's segmentation model that detects, segments, and tracks objects in images and video from text or visual prompts.

// Overview

// What it does

// Getting started

Install PyTorch

Install MMEngine and MMCV with mim

Install MMDetection

Download a model and run inference

// When to use it

// How MMDetection compares

Overview

What it does

Getting started

When to use it

How MMDetection compares