Efficient methods for application performance analysis

Detta är en avhandling från Uppsala universitet

Sammanfattning: To reduce latency and increase bandwidth to memory, modern microprocessors are designed with deep memory hierarchies including several levels of caches. For such microprocessors, the service time for fetching data from off-chip memory is about two orders of magnitude longer than fetching data from the level-one cache. Consequently, the performance of applications is largely determined by how well they utilize the caches in the memory hierarchy, captured by their miss ratio curves. However, efficiently obtaining an application's miss ratio curve and interpreting its performance implications is hard. This task becomes even more challenging when analyzing application performance on multicore processors where several applications/threads share caches and memory bandwidths. To accomplish this, we need powerful techniques that capture applications' cache utilization and provide intuitive performance metrics.In this thesis we present three techniques for analyzing application performance, StatStack, StatCC and Cache Pirating. Our main focus is on providing memory hierarchy related performance metrics such as miss ratio, fetch ratio and bandwidth demand, but also execution rate. These techniques are based on profiling information, requiring both runtime data collection and post processing. For such techniques to be broadly applicable the data collection has to have minimal impact on the profiled application, allow profiling of unmodified binaries, and not depend on custom hardware and/or operating system extensions. Furthermore, the information provided has to be accurate and easy to interpret by programmers, the runtime environment and compilers.StatStack estimates an application's miss ratio curve, StatCC estimates the miss ratio of co-running application sharing the last-level cache and Cache Pirating measures any desired performance metric available through hardware performance counters as a function of cache size. We have experimentally shown that our methods are both efficient and accurate. The runtime information required by StatStack and StatCC can be collected with an average runtime overhead of 40%. The Cache Pirating method measures the desired performance metrics with an average runtime overhead of 5%.