Paper Note of DeftNN

Title: DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission

Topic: accelerate dnn application on GPU by neural pruning and data precision – storage trade-off

Idea: compute correlation of dnn layers to prune low contribution layers and calibrate based on the assumption that parameters of neuron follow same distribution, analyse the bottleneck of GPU memory bandwidth and come up with data fission methods

Contribution: DeftNN, a framework that speedup dl applications on GPU; Synapse vector elimination, a method that prune synapse without leaving inefficient sparse representation; improved data fusion tech

Intellectual merit: avoid sparsity via layer pruning rather than simply weight pruning, analysis/improve to address GPU hardware bandwidth bottleneck

Strengths: detailed experiments record

Weakness: 6 dnn applications are not well-known, and no relative architecture introduction