Overview
MMDetection is an open-source object detection toolbox built on PyTorch and is part of the OpenMMLab project. It breaks the detection pipeline into separate modules, so you can mix and match backbones, necks, heads, and training settings to build a custom model without rewriting the whole framework.
It is aimed at computer vision researchers and engineers who need a tested base for detection and segmentation work. Out of the box it covers object detection, instance segmentation, panoptic segmentation, and semi-supervised object detection, and it ships a large model zoo of pre-trained weights you can run or fine-tune.
Within the computer vision space, MMDetection sits alongside frameworks like Detectron2 as a config-driven training and inference library. It relies on two companion OpenMMLab packages, MMEngine for training and MMCV for vision operations, which you install before MMDetection itself.
What it does
- Modular design that lets you assemble a custom detector by combining backbone, neck, and head components
- Supports object detection, instance segmentation, panoptic segmentation, and semi-supervised object detection out of the box
- Large model zoo of pre-trained configs and weights, including RTMDet and MM-Grounding-DINO
- Core bbox and mask operations run on GPU for fast training and inference
- Config-based workflow for reproducible training and testing on standard datasets like COCO
- Works with PyTorch 1.8+ and integrates with the OpenMMLab MMEngine and MMCV packages
Getting started
Install PyTorch, then add the OpenMMLab dependencies with mim before installing MMDetection. The example below downloads a small RTMDet model and runs inference on a demo image.
Install PyTorch
Use conda to install PyTorch and torchvision. Use the GPU build if you have CUDA, or the CPU-only build otherwise.
conda install pytorch torchvision -c pytorchInstall MMEngine and MMCV with mim
MMDetection depends on these two OpenMMLab packages. The mim tool resolves the matching versions for you.
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0"Install MMDetection
Install the published package with mim, or clone the repo and install in editable mode for development.
mim install mmdetDownload a model and run inference
Fetch a pre-trained RTMDet model, then run detection on a demo image with the Python API.
mim download mmdet --config rtmdet_tiny_8xb32-300e_coco --dest .
from mmdet.apis import init_detector, inference_detector
config_file = 'rtmdet_tiny_8xb32-300e_coco.py'
checkpoint_file = 'rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth'
model = init_detector(config_file, checkpoint_file, device='cpu')
inference_detector(model, 'demo/demo.jpg')Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Run a pre-trained detector from the model zoo to get bounding boxes or masks on your own images
- Fine-tune an existing detection or instance segmentation model on a custom dataset
- Build and benchmark a new detection architecture by swapping modular components in a config
- Reproduce or compare published detection and segmentation results on standard datasets like COCO
How MMDetection compares
MMDetection alongside other open-source vision & understanding tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| PaddleOCR | ★ 83.1k | A toolkit for detecting and recognizing text in images across many languages, plus document parsing. |
| Ultralytics YOLO | ★ 58.6k | A framework for training and running YOLO models for real-time object detection, segmentation, and tracking. |
| Supervision | ★ 44.7k | A Python toolkit for processing, annotating, and visualizing detections and segmentations from many vision models. |
| MMDetection | ★ 32.8k | PyTorch object detection and segmentation toolbox with a large model zoo |
| Segment Anything 2 (SAM 2) | ★ 19.4k | Meta's model for segmenting and tracking any object across images and video frames from clicks or boxes. |
| Grounded-SAM | ★ 17.6k | A pipeline that combines Grounding DINO and Segment Anything to detect and segment objects from text prompts. |
| DINOv3 | ★ 10.7k | Meta's self-supervised vision backbone that produces general-purpose image features for many downstream tasks. |
| Segment Anything 3 (SAM 3) | ★ 10.6k | Meta's segmentation model that detects, segments, and tracks objects in images and video from text or visual prompts. |