Title: DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission
Topic: accelerate dnn application on GPU by neural pruning and data precision – storage trade-off
Idea: compute correlation of dnn layers to prune low contribution layers and calibrate based on the assumption that parameters of neuron follow same distribution, analyse the bottleneck of GPU memory bandwidth and come up with data fission methods
Contribution: DeftNN, a framework that speedup dl applications on GPU; Synapse vector elimination, a method that prune synapse without leaving inefficient sparse representation; improved data fusion tech
Intellectual merit: avoid sparsity via layer pruning rather than simply weight pruning, analysis/improve to address GPU hardware bandwidth bottleneck
Strengths: detailed experiments record
Weakness: 6 dnn applications are not well-known, and no relative architecture introduction