Pointwise Maximal Leakage : Robust, Flexible and Explainable Privacy

Sammanfattning: For several decades now, safeguarding sensitive information from disclosure has been a key focus in computer science and information theory. Especially, in the past two decades, the subject of privacy has received significant attention due to the widespread collection and processing of data in various facets of society. A central question in this area is "What can be inferred about individuals from the data collected from them?"This doctoral thesis delves into a foundational and application-agnostic exploration of the theory of privacy. The overarching objective is to construct a comprehensive framework for evaluating and designing privacy-preserving data processing systems that adhere to three essential criteria:Explainability. The notion of information leakage (or privacy loss) employed in this framework should be operationally meaningful. That is, it should naturally emerge from the analysis of adversarial attack scenarios. Privacy guarantees within this framework should be comprehensible to stakeholders and the associated privacy parameters should be meaningful and interpretable. Robustness. The notion of information leakage employed should demonstrate resilience against a diverse array of potential adversaries, accommodating a broad range of attack scenarios while refraining from making restrictive assumptions about adversarial capabilities.Flexibility. The framework should offer value in a variety of contexts, catering to both highly privacy-sensitive applications and those with more relaxed privacy requirements. The notion of information leakage employed should also be applicable to various data types.The privacy notion proposed in this thesis that aligns with all the above criteria is called pointwise maximal leakage (PML). PML is a random variable that measures the amount of information leaking about a secret random variable X to a publicly available related random variable Y. We first develop PML for finite random variables by studying two seemingly different but mathematically equivalent adversarial setups: the randomized function model and the gain function model. We then extend the gain function model to random variables on arbitrary probability spaces to obtain a more general form of PML. Furthermore, we study the properties of PML in terms of pre and post-processing inequalities and composition, define various privacy guarantees, and compare PML with existing privacy notions from the literature including differential privacy and its local variant. PML, by definition, is an inferential privacy measure in the sense that it compares an adversary's posterior knowledge about X with her prior knowledge. However, a prevalent misconception in the area suggests that meaningful inferential privacy guarantees are unattainable, due to an over-interpretation of a result called the impossibility of absolute disclosure prevention. Through a pivotal shift in perspective, we characterize precisely the types of disclosures that can be prevented through privacy guarantees and those that remain inevitable. In this way, we argue in favor of inferential privacy measures. On the more application-oriented front, we examine a common machine learning framework for privacy-preserving learning called Private Aggregation of Teacher Ensembles (PATE) using an information-theoretic privacy measure. Specifically, we propose a conditional form of the notion of maximal leakage to quantify the amount of information leaking about individual data entries and prove that the leakage is Schur-concave when the injected noise has a log-concave probability density. The Schur-concavity of the leakage implies that increased classification accuracy improves privacy. We also derive upper bounds on the information leakage when the injected noise has Laplace distribution.Finally, we design optimal privacy mechanisms that minimize Hamming distortion subject to maximal leakage constraints assuming that (i) the data-generating distribution (i.e., the prior) is known, or (ii) the prior belongs to a certain set of possible distributions. We prove that sets of priors that contain more "uniform" distributions generate larger distortion. We also prove that privacy mechanisms that distribute the privacy budget more uniformly over the outcomes create smaller worst-case distortion. 

  KLICKA HÄR FÖR ATT SE AVHANDLINGEN I FULLTEXT. (PDF-format)