Paper Note of RFCN

Title: R-FCN: Object Detection via Region-based Fully Convolutional Networks

Contribution:

  • introduce the ROI pooling layer at proper location for share computation
  • position sensitive score maps to alleviate the dilemma translation invariance for cls vs translation variance for det
  • ps roi pooling: (precondition: project rois to feature maps by using conv layer with k^2 * (C + 1) channels where k is the number each ROI divided by and C is the number of classes) abstract information in each bin then all k^2 bins vote for a C+1 channel vector

Experiment:

  • used ResNet-101 as backbone network followed by k^2(C+1) channel conv layer
  • reduce stride 32->16 pixel, used dilated conv on conv5
  • 83.6% mAP PASCAL VOC 2007, 82.0% 2012, test-time 170ms per image

Future works:

  • apply extensions of FCNs
Show comments from Gitment