On Uncertainty Quantification in Neural Networks: Ensemble Distillation and Weak Supervision

Sammanfattning: Machine learning models are employed in several aspects of society, ranging from autonomous cars to justice systems. They affect your everyday life, for instance through recommendations on your streaming service and by informing decisions in healthcare, and are expected to have even more influence in society in the future. Among these machine learning models, we find neural networks which have had a wave of success within a wide range of fields in recent years. The success of neural networks are partly attributed to the very flexible model structure and, what it seems, endless possibilities in terms of extensions.While neural networks come with great flexibility, they are so called black-box models and therefore offer little in terms of interpretability. In other words, it is seldom possible to explain or even understand why a neural network makes a certain decision. On top of this, these models are known to be overconfident, which means that they attribute low uncertainty to their predictions, even when uncertainty is, in reality, high. Previous work has demonstrated how this issue can be alleviated with the help of ensembles, i.e. by weighing the opinion of multiple models in prediction. In Paper I, we investigate this possibility further by creating a general framework for ensemble distribution distillation, developed for the purpose of preserving the performance benefits of ensembles while reducing computational costs. Specifically, we extend ensemble distribution distillation to make it applicable to tasks beyond classification and demonstrate the usefulness of the framework in, for example, out-of-distribution detection.Another obstacle in the use of neural networks, especially deep neural networks, is that supervised training of these models can require a large amount of labelled data. The process of annotating a large amount of data is costly, time-consuming and also prone to errors. Specifically, there is a risk of incorporating label noise in the data. In Paper II, we investigate the effect of label noise on model performance. In particular, under an input-dependent noise model, we analyse the properties of the asymptotic risk minimisers of strictly proper and a set of previously proposed, robust loss functions. The results demonstrate that reliability, in terms of a model’s uncertainty estimates, is an important aspect to consider also in weak supervision and, particularly, when developing noise-robust training algorithms.Related to annotation costs in supervised learning, is the use of active learning to optimise model performance under budget constraints. The goal of active learning, in this context, is to identify and annotate the observations that are most useful for the model’s performance. In Paper III, we propose an approach for taking advantage of intentionally weak annotations in active learning. What is proposed, more specifically, is to incorporate the possibility to collect cheaper, but noisy, annotations in the active learning algorithm. Thus, the same annotation budget is enough to annotate more data points for training. In turn, the model gets to explore a larger part of the input space. We demonstrate empirically how this can lead to gains in model performance.

  Denna avhandling är EVENTUELLT nedladdningsbar som PDF. Kolla denna länk för att se om den går att ladda ner.