Title: APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUs
Topic: Analyze the situation of data cache and improve GPU L1 cache performance based on group warp priority with prefetch to enhance cache reuse
Idea: Based on the locality of data usage, Locality Aware Warp Scheduler (LAWS) group warps who share last load PC and set priority to do greedy manage, store all global memory load on Last Load Table, put the grouping information in Warp Group Table and schedule execute order by the Load Store Unit (LSU) and Scheduling Aware Prefetching (SAP), who would put the missing warp at low priority and ask LAWS to give high priority to the prefetch warp for that missing group, with stored previous load and warp ID in the missing group.
Contribution: Analysis of load instructions on GPU and a mechanism named APRES that combine grouping warp to execute with priority and prefetching.
Intellectual merit: Figure out the locality of cache utilization and opportunity to prefetch, proposed dedicated mechanism aimed at GPU L1 cache optimization based on the observation: LAWS group warps to do priority management, SAP change missing warp to the end and do prefetch.
Strengths: Detailed legends as examples to illustrate the mechanism
Weakness: Performance may limited because of simulation and no real GPU environment