Title: Mask R-CNN
Contribution:
- extend Faster R-CNN by adding a mask branch, which could be used for seg and also improves accuracy
- the mask branch is a small FCN applied to each ROI
- a mask encodes an input object’s spatial layout
- extracting the spatial structure of masks can be addressed naturally by the pixel-to-pixel correspondence provided by convolutions
- the fully conv needs fewer params and is more accurate
- illustrates that decouple mask and class prediction is essential, so that the loss of the mask branch is the avg binary cross-entropy loss
- proposed ROI Align for better predicting pixel-accurate masks
- avoid quantization of the boundaries or bins
- insensitive to max/avg pool
- shows ablation experiments and analysis of improvements