MMDetection

1. Introduction

MMdetection的特点：

模块化设计：将不同网络的部分进行切割，模块之间具有很高的复用性和独立性（十分便利，可以任意组合）
高效的内存使用
支持多种框架
SOTA

2. Support Frameworks

单阶段检测器

SSD(2015)、RetinaNet(2017)、GHM(2019)、FCOS(2019)、FSAF(2019)
两阶段检测器

Faster R-CNN(2015)、R-FCN(2016)、Mask R-CNN(2017)、Grid R-CNN(2018)、Mask Scoring R-CNN(2019)、Double-Head R-CNN(2019)
多阶段检测器

Cascade R-CNN(2017)、Hybrid Task Cascade(2019)

通用模块和方法
- soft-NMS(2017)、DCN(2017)、OHEN(2016)、DCN2(2018)、Train from Scratch(2018)、ScratchDet(2018)、M2Det (2018)、GCNet(2019) 、Generalized Attention(2019)、SyncBN(2018)、Group Normalization(2018)、Weight Standardization(2019)、HRNet(2019) 、Guided Anchoring(2019)、Libra R-CNN(2019)

3. Architecture

模型表征：划分为以下几个模块：

Backbone（ResNet等）、Neck（FPN）、DenseHead（AnchorHead）、RoIExtractor、RoIHead（BBoxHead/MaskHead）

结构图如下：

4. Benchmarks

Datasets

支持COCO-sytle和 VOC-style数据集
Implementation details

Images are resized to a maximum scale of 1333 × 800,without changing the aspect ratio.

“1x” and “2x” means 12 epochs and 24 epochs respectively. “20e” is adopted in cascade models, which denotes 20 epochs.
Evaluation metrics

Adopting standard evaluation metrics for COCO dataset, where multiple IoU thresholds from 0.5 to 0.95 are applied.

The results of region proposal network (RPN) are measured with Average Recall (AR) and detection results are evaluated with mAP.

5. Config文件说明

model

backbone：通常是FCN，用于提取特征图，例如ResNet

neck：

rpn_head

box_roi_extractor：

bbox_head

train_cfg

test_cfg

数据管道

train_pipeline

test_pipeline

# model settings
model = dict(
    type='FastRCNN',
    pretrained='torchvision://resnet50',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        style='pytorch'),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    bbox_roi_extractor=dict(
        type='SingleRoIExtractor',
        roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
        out_channels=256,
        featmap_strides=[4, 8, 16, 32]),
    bbox_head=dict(
        type='SharedFCBBoxHead',
        num_fcs=2,
        in_channels=256,
        fc_out_channels=1024,
        roi_feat_size=7,
        num_classes=81,
        target_means=[0., 0., 0., 0.],
        target_stds=[0.1, 0.1, 0.2, 0.2],
        reg_class_agnostic=False,
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)))
# model training and testing settings
train_cfg = dict(
    rcnn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.5,
            neg_iou_thr=0.5,
            min_pos_iou=0.5,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=512,
            pos_fraction=0.25,
            neg_pos_ub=-1,
            add_gt_as_proposals=True),
        pos_weight=-1,
        debug=False))
test_cfg = dict(
    rcnn=dict(
        score_thr=0.05, nms=dict(type='nms', iou_thr=0.5), max_per_img=100))
# dataset settings
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
  	# 数据加载
    dict(type='LoadImageFromFile'),   
    dict(type='LoadProposals', num_max_proposals=2000),
    dict(type='LoadAnnotations', with_bbox=True),
    
  	# 预处理
  	dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    
  	# 格式化
  	dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'proposals', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadProposals', num_max_proposals=None),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img', 'proposals']),
        ])
]
data = dict(
    imgs_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_train2017.json',
        proposal_file=data_root + 'proposals/rpn_r50_fpn_1x_train2017.pkl',
        img_prefix=data_root + 'train2017/',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        proposal_file=data_root + 'proposals/rpn_r50_fpn_1x_val2017.pkl',
        img_prefix=data_root + 'val2017/',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        proposal_file=data_root + 'proposals/rpn_r50_fpn_1x_val2017.pkl',
        img_prefix=data_root + 'val2017/',
        pipeline=test_pipeline))
# optimizer
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    step=[8, 11])
checkpoint_config = dict(interval=1)
# yapf:disable
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook')
    ])
# yapf:enable
# runtime settings
total_epochs = 12
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/fast_rcnn_r50_fpn_1x'
load_from = None
resume_from = None
workflow = [('train', 1)]

Open MMLab Detection Toolbox and Benchmark

1. Introduction

2. Support Frameworks

3. Architecture

4. Benchmarks

5. Config文件说明

CATALOG

FEATURED TAGS

FRIENDS