Towards Scalable Machine Learning with Privacy Protection

Sammanfattning: The increasing size and complexity of datasets have accelerated the development of machine learning models and exposed the need for more scalable solutions. This thesis explores challenges associated with large-scale machine learning under data privacy constraints. With the growth of machine learning models, traditional privacy methods such as data anonymization are becoming insufficient. Thus, we delve into alternative approaches, such as differential privacy.Our research addresses the following core areas in the context of scalable privacy-preserving machine learning: First, we examine the implications of data dimensionality on privacy for the application of medical image analysis. We extend the classification algorithm Private Aggregation of Teacher Ensembles (PATE) to deal with high-dimensional labels, and demonstrate that dimensionality reduction can be used to improve privacy. Second, we consider the impact of hyperparameter selection on privacy. Here, we propose a novel adaptive technique for hyperparameter selection in differentially gradient-based optimization. Third, we investigate sampling-based solutions to scale differentially private machine learning to dataset with a large number of records. We study the privacy-enhancing properties of importance sampling, highlighting that it can outperform uniform sub-sampling not only in terms of sample efficiency but also in terms of privacy.The three techniques developed in this thesis improve the scalability of machine learning while ensuring robust privacy protection, and aim to offer solutions for the effective and safe application of machine learning in large datasets.

  KLICKA HÄR FÖR ATT SE AVHANDLINGEN I FULLTEXT. (PDF-format)