Programming Abstractions and Optimization Techniques for GPU-based Heterogeneous Systems

Sammanfattning: CPU/GPU heterogeneous systems have shown remarkable advantages in performance and energy consumption compared to homogeneous ones such as standard multi-core systems.Such heterogeneity represents one of the most promising trendsfor the near-future evolution of high performance computing hardware.However, as a double-edged sword, the heterogeneity also brings significant programming complexitiesthat prevent the easy and efficient usage of different such heterogeneous systems.In this thesis, we are interested in four such kinds of fundamental complexities that are associated withthese heterogeneous systems: measurement complexity (efforts required to measure a metric, e.g., measuring enegy), CPU-GPU selection complexity, platform complexity and data management complexity. We explore new low-cost programming abstractions to hide these complexities,and propose new optimization techniques that could be performed under the hood.For the measurement complexity, although measuring time is trivial by native library support,measuring energy consumption, especially for systems with GPUs, is complexbecause of the low level details involved such as choosing the right measurement methods, handling the trade-off between sampling rate and accuracy,and switching to different measurement metrics.We propose a clean interface with its implementationthat not only hides the complexity of energy measurement,but also unifies different kinds of measurements. The unificationbridges the gap between time measurement and energy measurement,and if no metric-specific assumptions related to time optimization techniques are made,energy optimization can be performedby blindly reusing time optimization techniques.For the CPU-GPU selection complexity, which relates to efficient utilization of heterogeneous hardware,we propose a new adaptive-sampling based construction mechanism of predictors for such selections which can adapt to different hardware platforms automatically,and shows non-trivial advantages over random sampling.For the platform complexity, we propose a new modular platform modeling language and its implementation to formally and systematically describe a computer system,enabling zero-overhead platform information queries for high-level software tool chains and for programmers as a basis for making software adaptive.For the data management complexity, we propose a new mechanism to enable a unified memory view on heterogeneous systemsthat have separate memory spaces. This mechanism enables programmers to write significantly less code,which runs equally fast with expert-written code and outperforms the current commercially available solution: Nvidia's Unified Memory.We further propose two data movement optimization techniques, lazy allocation and transfer fusion optimization.The two techniques are based on adaptively merging messages to reduce data transfer latency.We show that these techniques can be potentially beneficial and we prove that our greedy fusion algorithm is optimal.Finally, we show that our approaches to handle different complexities can be combined so that programmers could use them simultaneously.This research was partly funded by two EU FP7 projects (PEPPHER and EXCESS) and SeRC.

  Denna avhandling är EVENTUELLT nedladdningsbar som PDF. Kolla denna länk för att se om den går att ladda ner.