R-CNN Family

R-CNN, SPPNet, Fast R-CNN, Faster R-CNN evolution of region-based detectors

Papers: R-CNN · SPPNet · Fast R-CNN · Faster R-CNN

R-CNN

R-CNN Overview

R-CNN Architecture

  1. Selective Search generates ~2000 region proposals
  2. Warp to 227x227
  3. Run CNN on each proposal
  4. SVM for classification + linear regressor for bbox

Cons: Redundant computation, slow training (no end-to-end), fixed region proposals.

SPPNet

SPPNet Overview

SPPNet Architecture

  • Makes R-CNN fast at test time by running CNN once on full image
  • Still: slow training, fixed region proposals

Fast R-CNN

Fast R-CNN Overview

Fast R-CNN Architecture

  1. Feature extraction over full image (one CNN pass)
  2. RoI Pooling layer for fixed-size features
  3. Softmax replaces SVM

RoI Pooling Layer

Faster R-CNN

Faster R-CNN

  • Introduces Region Proposal Network (RPN)
  • 9 anchor boxes per location (3 scales x 3 aspect ratios)

RPN Architecture

RPN Explained

RPN Training

Training Pipeline

  • $p^* = 1$ if IoU > 0.7, $p^* = 0$ if IoU < 0.3
  • Network predicts relative offsets $(t_x, t_y, t_w, t_h)$:
    • $t_x = (x-x_a)/w_a$, $t_w = \log(w/w_a)$
  • Binary CE loss + Smooth L1 loss

Performance

  R-CNN Fast R-CNN Faster R-CNN
Test time/image (sec) 50 2 0.2
Speed-Up 1X 25X 250X
mAP (VOC 2007) 66.0 66.9 66.9