Paper Note of maskRCNN

Title: Mask R-CNN

Contribution:

extend Faster R-CNN by adding a mask branch, which could be used for seg and also improves accuracy
- the mask branch is a small FCN applied to each ROI
- a mask encodes an input object’s spatial layout
- extracting the spatial structure of masks can be addressed naturally by the pixel-to-pixel correspondence provided by convolutions
- the fully conv needs fewer params and is more accurate
illustrates that decouple mask and class prediction is essential, so that the loss of the mask branch is the avg binary cross-entropy loss
proposed ROI Align for better predicting pixel-accurate masks
- avoid quantization of the boundaries or bins
- insensitive to max/avg pool
shows ablation experiments and analysis of improvements