Papers: R-CNN · SPPNet · Fast R-CNN · Faster R-CNN
R-CNN


- Selective Search generates ~2000 region proposals
- Warp to 227x227
- Run CNN on each proposal
- SVM for classification + linear regressor for bbox
Cons: Redundant computation, slow training (no end-to-end), fixed region proposals.
SPPNet


- Makes R-CNN fast at test time by running CNN once on full image
- Still: slow training, fixed region proposals
Fast R-CNN


- Feature extraction over full image (one CNN pass)
- RoI Pooling layer for fixed-size features
- Softmax replaces SVM

Faster R-CNN

- Introduces Region Proposal Network (RPN)
- 9 anchor boxes per location (3 scales x 3 aspect ratios)


RPN Training

- $p^* = 1$ if IoU > 0.7, $p^* = 0$ if IoU < 0.3
- Network predicts relative offsets $(t_x, t_y, t_w, t_h)$:
- $t_x = (x-x_a)/w_a$, $t_w = \log(w/w_a)$
- Binary CE loss + Smooth L1 loss
Performance
| R-CNN | Fast R-CNN | Faster R-CNN | |
|---|---|---|---|
| Test time/image (sec) | 50 | 2 | 0.2 |
| Speed-Up | 1X | 25X | 250X |
| mAP (VOC 2007) | 66.0 | 66.9 | 66.9 |