
Addresses class imbalance in anchor-based detection (most anchors contain no object).
Key Points
- Proposed: Focal loss — as $\gamma$ increases, easy sample weight decreases
- Backbone: ResNet for powerful feature extraction
- Multi-scale prediction
- 9 anchors per level, each with classification and regression target