tricks in jupyter

Posted on 2018-03-29

%matplotlib inline
%config InlineBackend.figure_format = ‘retina’

Paper Note of gScale

Posted on 2018-03-28

Title: gScale: Scaling up GPU Virtualization with Dynamic Sharing of Graphics Memory Space

Topic: a state of art solution about improve the scaling ability of GPU Virtualization

Idea: analyze the bottleneck of gVirt that fixed size resource partition limits the vGPU instances, based on gVirt, come up with the idea that give each vGPU a private shadow graphics translation table (GTT) to share the high physical graphic memory and synchronize only when the instance is rendering and use on-demand copy to reduce cost, share the low graphic memory by using Extended Page Table (EPT) to establish a mapping relation from guest physical address and host physical address then link those directly so that CPU can access the host memory without using global graphic memory, use a part of low graphic memory and apply dynamic management to serve fence register, divide the high graphic memory to slots which can be shared by vGPU and avoid context switch for idle vGPU to handle many vGPU.

Contribution: an open-source GPU Virtualization solution that can reach the state of the art standar

Intellectual merit: a private shadow GTT to share high graphic memory, a mechanism named ladder mapping to share low graphic memory, use fence memory space pool to solve the invalid fence register problem caused by ladder mapping, and use slot sharing to reduce cost when launched many vGPU

Strengths: clear logic that analyze the bottleneck of gVirt and give improve ideas step by step

Weakness: not so clearly illustrate the ladder mapping idea via example

Paper Note of EvaluationDLonHPC

Posted on 2018-03-22

Title: Evaluation of Deep Learning Frameworks over Different HPC Architectures

Topic: talk about performance in training time of 3 DL frameworks over GPU, CPU using/without corresponding optimization tech NVLink, KNL

Idea: using different deep neural networks trained on different combination of GPU/CPU using/without optimization tech, different DL frameworks, different training batch size and scale up/out, control factors to evaluate the corresponding performance

Contribution: a set of training time performance benchmark in different framework-architecture setting

Intellectual merit: new in evaluation of NVLink & KNL, and evaluation of Caffe, TensorFlow, SINGA

Strengths: sufficient experiments, for example use TensorFlow in two different scaling situations as a mid factor so that to indirectly compare the performance of Caffe and SINGA

Weakness: hard to reimplementing since the difficulty of compiling frameworks/implement dnns

Paper Note of DeftNN

Posted on 2018-03-21

Title: DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission

Topic: accelerate dnn application on GPU by neural pruning and data precision – storage trade-off

Idea: compute correlation of dnn layers to prune low contribution layers and calibrate based on the assumption that parameters of neuron follow same distribution, analyse the bottleneck of GPU memory bandwidth and come up with data fission methods

Contribution: DeftNN, a framework that speedup dl applications on GPU; Synapse vector elimination, a method that prune synapse without leaving inefficient sparse representation; improved data fusion tech

Intellectual merit: avoid sparsity via layer pruning rather than simply weight pruning, analysis/improve to address GPU hardware bandwidth bottleneck

Strengths: detailed experiments record

Weakness: 6 dnn applications are not well-known, and no relative architecture introduction

Paper Note of DeepMon

Posted on 2018-03-20

Title: DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications

Topic: Reduce deep learning applications’ latency on mobile devices.

Idea: Utilize GPU of mobile device, convert deep learning model, store metadata on host mem and parameters on device mem, cache first N conv layers’ output and refresh based on a histogram similarity policy, decompose conv parameters from one tensor to 3 small tensors, optimize conv operations via try unfolding & half float, propose multi kernels to fit different mobile GPU.

Contribution: A toolkit DeepMon that runs deep learning applications on commodity mobile with low latency, implemented with OpenCL and Vulkan

Intellectual merit: Proposed optimization methods to reduce latency without significantly loss performance, without offload to other powerful servers via server-client mode and save energy.

Strengths: Detailed observation & experiments on every step. For example when choosing number of bins at caching step, they set a lot of experiments to draw graph to see how the performance change.

Weakness: No enough guidance for dl developers who wants to make mobile apps: training/testing VGG and YOLO on different datasets

Binary Search Framework

Posted on 2017-10-12

Use Binary Search to find the first/last element with a property in a sorted array.

l: first position in your consideration
h: last position in your consideration, even can larger than the last index of the array

while (l < h) {
    // first element
    m_l = l + (h - l) / 2
    if (!Satisfy(m_l)) {
        l = m + 1
    } else {
        h = m
    }
    /*
    // last element
    m_h = l + (h - l) / 2 + 1
    if (!Satisfy(m_h)) {
        h = m - 1
    } else {
        l = m
    }
    */
}

The idea is that if the mid point satisfies a property, then keep the opposite bound of round at mid, otherwise move the corresponding bound toward mid exceed 1.
Finally its true that l == h

Install Kaldi in OSX

Posted on 2017-08-29

I installed kaldi in my own Mac from scratch, and encountered some problems.
Here are some guidance for some one may have trouble in installing kaldi.

git clone https://github.com/kaldi-asr/kaldi.git kaldi-trunk --origin golden
./kaldi-trunk/tools/extras/check_dependencies.sh
Install all the dependencies. Notice that libtoolize is named libtool within brew.
cd kaldi-trunk/tools; make
cd ../src; ./configure; make

Mathematical_Aspects_of_DL

Posted on 2017-07-20

Mathematical Aspects of Deep Learning

Hello World

Posted on 2017-07-19

Excited.

Yunjie Wang (fr42k)

A human's fate should be certainly concerned with his efforts, meanwhile should be connected with the historical process.