Aggregation as Unsupervised Learning in Software Engineering and Beyond

Sammanfattning: Ranking alternatives is fundamental to effective decision making. However, creating an overall ranking is difficult if there are multiple criteria, and no single alternative performs best across all criteria. Software engineering is no exception.Software quality is usually decomposed hierarchically into characteristics, and their quality can be assessed by various direct and indirect metrics. Although such quality models provide a basic understanding of what data to collect and which metrics to use, it is not clear how the metrics should be combined to assess the overall quality. Due to different approaches for aggregation of metrics, the same quality model and the same metrics for assessing the same software artifact could still lead to different assessment results and even to different interpretations.The proposed aggregation approach in this thesis is well-defined, interpretable, and applicable under realistic conditions. This approach can turn the quality- model- and metric-based assessment of (software) quality into a reliable and reproducible process. We express quality as the probability of detecting something with equal or worse quality, based on all software artifacts observed; good and bad quality is expressed in terms of lower and higher probabilities. We validated our approach theoretically and empirically. We conducted empirical studies on Bug prediction, Maintainability assessment, and Information Quality.We used Software Visualization to analyze the usability of aggregation for analyzing multivariate data in general and the effect of different alternative aggregation approaches, i.e., we designed and implemented an exploratory multivariate data visualization tool.Finally, we applied our approach to Multi-criteria Ranking to evaluate its transferability to other domains. We evaluated it on a real-world decision-making problem for assessment and ranking of alternatives. Moreover, we applied our approach to the context of Machine Learning. We created a benchmark from a collection of regression problems, and evaluated how well the aggregation output agrees with a ground truth, and how well it represents the properties of the input variables.The results showed that our approach is not only theoretically sound, it is also accurate, sensitive, identifies anomalies, scales in performance, and can support multi-criteria decision making. Furthermore, our approach is transferable to other domains that require aggregation in hierarchically structured models, and it can be used as an agnostic unsupervised predictor in the absence of a ground truth.

  KLICKA HÄR FÖR ATT SE AVHANDLINGEN I FULLTEXT. (PDF-format)