Title: Fully Convolutional Networks for Semantic Segmentation
Contribution:
- use fully convolution to get heatmap as output which can provide pixelwise information
- use skip structure to fuse low level precision feature with high level coarse spatial (semantic) feature
- investigated shift-and-stitch (deprecated), patchwise training (deprecated) in training phase
- upsampling as backwards strided convolution, effective for learning dense prediction
Experiment:
- used VGG16 as backbone network
- measured with: pixel accuracy, mean accuracy, mean Intersection over Union, frequency weighted IU, time of inference
- datasets: PASCAL VOC, NYUDv2, SIFT Flow, achieve the state-of-the-art (contrasted to r-cnn, SDS), also with faster inference speed
Pros: end-to-end, could make use of classification nets with little modification on architecture
Future work:
- some more dedicated way instead of bilinear upsampling
- why add depth information improved insignificant?