Yunjie Wang

Excited


  • Home

  • Tags

  • Categories

  • Archives

Paper Note of learning to explore for meta RL

Posted on 2018-05-07

Title: Some Considerations on Learning to Explore via Meta-Reinforcement Learning

Idea: Modify the mathmatical target so that take exploration into consideration, come up with 2 meta RL algorithms E-MAML and E-RL, and propose a Krazy World environment to benchmark meta RL.

Paper Note of Classical Planners for Tasks with Continuous Operators

Posted on 2018-05-03

Title: Using Classical Planners for Tasks with Continuous Operators in Robotics

Idea:

  • Provide an algorithm for high-level planning which utilized classic planners and low-level executing:
    • Use Skolem symbols to instead continuous operators for high-level, discrete planning,
    • use sampling to estimate the values of Skolem symbols and RRT-based techniques to implement movements representing each action in the plan.
    • get error message from low level then return to high level planner to refine

Contribution:

  • provides a method for synchronizing between continuous and discrete planning layers
  • utilize classical planner when facts and operator effects over continuous variables are not available a priori

Experiment:

  • simulated using OpenRave
  • Pick and Place Scenario:
    • planner: FastForward (well-established)
    • 30 tests over 3 randomly generated environments
    • 80 to 130 configurations
    • the total average time is far less than the precomp-methods, and needs acceptable re-plan
  • Dining Table Set-Up Scenario
    • planner: FastDownward (cost-sensitive)
    • required 230s, including execution

Paper Note of GRP

Posted on 2018-05-02

Title: Learning Generalized Reactive Policies using Deep Neural Networks

Idea:

  1. Randomly generate problem set, then use off-the-shelf planner to provide plan and trajectory, add observations & actions. Use bootstrap to increase training data.
  2. Given the training data, use imitation learing (IL) to learn generated reactive policy (GRP) which represented by deep neural networks.
  3. Further more, the GRP can be used as a heuristic function for guided search.
  4. leapfrogging, which is use GRP to generate data for learning more complex GRP iteratively.

Contribution:

  • use dl to get GRP, reduce handcraft knowledge/feature
  • automatically learn a heuristic function
  • the approach could be used in end-to-end system

Experiments:

  • Sokoban
    • 9k distinct obstacle * 5 random start/goal locations
    • evaluate performance both on test domains of the same size the GRPs were trained on, 99 grids, and
      larger problems
  • TSP

Paper Note of TOM

Posted on 2018-04-19

Title: Transparent Offloading and Mapping (TOM):Enabling Programmer-Transparent Near-Data Processing in GPU Systems

Topic: Data offloading for multiple 3D memory-stacked GPGPU architechture to alleviate the memory bandwidth bottleneck.

Idea: alleviate the memory bandwitdh bottleneck mainly by 2 phases: 1) use compiler-based technique to determine potentially offload candidate by calculate the benifit from offload, and make the offload decision at runtime, 2) use a data mapping technique that doesn’t need programmers’ effort by combining both hardware/software then determine the mapping method based on a small period observation using memory access patterns.

Contribution: 1)propose a new data offloading mechanism that statically identifies instruction blocks benefit from offloading, and dynamically decides whether candidate can be offload in runtime 2) propose a new programmer-transparent data mapping mechanism exploits the predictability in memory access patterns of to co-locate offloaded code and data in the same memory stack 3)comprehensively evaluate the mechanism using 10 memory-intensive GPGPU applications across different system configurations

The result shows that their mechanism is practical and effective approach to enabling programmer-transparent near-data processing in GPU systems.

Paper Note of GPU Performanc via ML

Posted on 2018-04-17

Title: GPU Performance and Power Estimation Using Machine Learning

Topic: Using machine learning for GPU performance and power rapid estimation

Idea: For training: Vary GPU setting from 3 perspective(They use 3 hardware params to illustrate the method, which can also be extend to many params): #compute units(CU), Mem Freq, Engine Freq to collect performance counters data, use K-means to cluster kernel performance: 1. CU scale, 2. Mem Freq & Engine Freq scale, use neural networks to classify kernel to cluster.
or prediction: run kernel on one setting once then classify to pattern cluster, use corresponding cluster centroid scaling values to make predictions on various GPU hardware setting.

Contribution: demonstrate on real hardware that the performance and power of GPGPU kernels scale in a limited number of ways as hardware configuration parameters are changed(finite unique scaling patterns), perform machine learning to estimate new kernel, describe an estimation model that run a kernel once on a hardware configuration then get performance and power over various configurations

The result shows that their accuracy is comparable to the accuracy of cycle-level simulators, and on real hardware the trained model runs equal or faster than native programs.

Paper Note of ASAT

Posted on 2018-04-05

Title: Architectural Support for Address Translation on GPUs

Topic: explore GPU Memory Management Units (MMUs) consisting of Translation Lookaside Buffers (TLBs) and page table walkers (PTWs), analyze current technology and propose augments

Idea: In Heterogeneous Systems Architecture, the uniform virtual address of GPU/CPU memory can provide programming benefits and need efficient hardware, GPU’s warp-based execution model plays an important role. Using CPU-like MMU design would hurt GPU performance, they explored cache-conscious warp/wavefront scheduling for MMU design and how TLB affects dynamic warp formation. They explored and propose methods in 3 aspects: Address Translation for GPU, Cache Conscious Warp Scheduling, thread block compaction.

The benefit of their work: remove need of CPU to handle GPU MMU, support multi-context and support application libraries.

General sight: Default warp scheduling would break temporal locality, sophisticated warp scheduling would loss effectiveness.

Paper Note of VGMM

Posted on 2018-04-05

Title: Vulnerable GPU Memory Management: Towards Recovering Raw Data from GPU

Topic: The security issue of Heterogeneous architecture from GPU memory management and algorighm to recover data from residue.

Idea: Analyze the weakness of HSA data security from GPU perspective, provide an algorighm that could recover image from GPU residue across the multiuser OS and Cloud computing, in 3 steps: get tilt, utilize the local similarity to idetify the size of image and remove leading blocks.

Paper Note of PPS

Posted on 2018-04-05

Title: Page Placement Strategies for GPUs within Heterogeneous Memory Systems

Topic: improve page placement performance for GPU in Heterogeneous Memory system

Idea: proposed bandwidth-aware (BW-AWARE) placement, maximizes GPU throughput by controlling the weights of BO and CO in aggregated memory, and optimize selectively place hot pages in the BO memory when capacity is limited, use profile to augment that provide information about hotness and size of data structures.

Contribution: a compiler-based GPU page placement solution

Paper Note of APRES

Posted on 2018-03-30

Title: APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUs

Topic: Analyze the situation of data cache and improve GPU L1 cache performance based on group warp priority with prefetch to enhance cache reuse

Idea: Based on the locality of data usage, Locality Aware Warp Scheduler (LAWS) group warps who share last load PC and set priority to do greedy manage, store all global memory load on Last Load Table, put the grouping information in Warp Group Table and schedule execute order by the Load Store Unit (LSU) and Scheduling Aware Prefetching (SAP), who would put the missing warp at low priority and ask LAWS to give high priority to the prefetch warp for that missing group, with stored previous load and warp ID in the missing group.

Contribution: Analysis of load instructions on GPU and a mechanism named APRES that combine grouping warp to execute with priority and prefetching.

Intellectual merit: Figure out the locality of cache utilization and opportunity to prefetch, proposed dedicated mechanism aimed at GPU L1 cache optimization based on the observation: LAWS group warps to do priority management, SAP change missing warp to the end and do prefetch.

Strengths: Detailed legends as examples to illustrate the mechanism

Weakness: Performance may limited because of simulation and no real GPU environment

Paper Note of APCM

Posted on 2018-03-29

Title: Access Pattern-Aware Cache Management for Improving Data Utilization in GPU

Topic: A mechanism to improve GPU performance based on data usage pattern

Idea: The design of GPU memory hierarchy not support well for warp, previous warp-level cache mechanisms are too coarse that streaming cannot benefit and some instructions with strong temporal locality are forced to bypass. Detect locality data type by monitoring per load exemplar warp and use cache tag array to track data sharing, then use different strategies for the according locality data types, make corresponding hardware design to support the mechanism. For streaming type bypass the L1 cache and directly use L2, for inter-warp utilize the LRU stategy, for intra-warp use a protection algorithm that estimate the lifespan of data to utilize the locality until mostly end the reuse.

Contribution: Analysis of the weakness of previous GPU cache management. Proposed a hardware based solution for improve GPU L1 cache performance.

Intellectual merit: Use the notion of access pattern similarity (APS) to measure the consistency of access patterns. Analysis the locality data type for per-load level optimization. Use a protection algorithm for utilize the dominant intra-warp locality.

Strengths: Clearly illustrate the strategies and design for APCM, detailed experiments to show the performance.

Weakness: Heuristic based that the locality data types tend to not change.

123

Yunjie Wang (fr42k)

A human's fate should be certainly concerned with his efforts, meanwhile should be connected with the historical process.

29 posts
6 tags
© 2022 Yunjie Wang (fr42k)
Powered by Hexo
|
Theme — NexT.Mist v5.1.4