Paper Note of FCN

Title: Fully Convolutional Networks for Semantic Segmentation

Contribution:

  • use fully convolution to get heatmap as output which can provide pixelwise information
  • use skip structure to fuse low level precision feature with high level coarse spatial (semantic) feature
  • investigated shift-and-stitch (deprecated), patchwise training (deprecated) in training phase
  • upsampling as backwards strided convolution, effective for learning dense prediction

Experiment:

  • used VGG16 as backbone network
  • measured with: pixel accuracy, mean accuracy, mean Intersection over Union, frequency weighted IU, time of inference
  • datasets: PASCAL VOC, NYUDv2, SIFT Flow, achieve the state-of-the-art (contrasted to r-cnn, SDS), also with faster inference speed

Pros: end-to-end, could make use of classification nets with little modification on architecture

Future work:

  • some more dedicated way instead of bilinear upsampling
  • why add depth information improved insignificant?
Show comments from Gitment