Title: R-FCN: Object Detection via Region-based Fully Convolutional Networks
Contribution:
- introduce the ROI pooling layer at proper location for share computation
- position sensitive score maps to alleviate the dilemma
translation invariance for cls vs translation variance for det
- ps roi pooling: (precondition: project rois to feature maps by using conv layer with k^2 * (C + 1) channels where k is the number each ROI divided by and C is the number of classes) abstract information in each bin then all k^2 bins vote for a C+1 channel vector
Experiment:
- used ResNet-101 as backbone network followed by k^2(C+1) channel conv layer
- reduce stride 32->16 pixel, used dilated conv on conv5
- 83.6% mAP PASCAL VOC 2007, 82.0% 2012, test-time 170ms per image
Future works:
- apply extensions of FCNs