A Further Step of Causal Discovery towards Real-World Impacts

Sammanfattning: The goal of many sciences is to find causal relationships and understand underlying mechanisms. As the golden standard for finding causal relationships, doing randomized experiments can be difficult or impossible in some applications; hence, determining underlying causal relationships purely from observational data, i.e., causal discovery, has attracted more and more attention in many domains, such as earth science, biology, and healthcare. On the one hand, computational methods of causal discovery have been developed and improved significantly in the recent three decades. On the other hand, there are still many challenges in both practice and theory to further achieve real-world impacts. This thesis aims to introduce the typical methods and challenges of causal discovery and then elaborates on the contributions of the included papers that step forward to achieve more real-world impacts for causal discovery. It mainly covers four challenges: practical issues, understanding and generalizing the restrictive assumptions, the lack of benchmark data sets, and applications of causality in machine learning topics. Each included paper contributes to one of the challenges.In the first paper, regarding causal discovery in the presence of missing data as one of the practical issues, we theoretically study the influence of missing values on causal discovery methods and then correct the errors in their results. Under mild assumptions, our proposed method provides asymptotically correct results.In the second paper, we investigate the understanding of assumptions in a class of causal discovery methods. Such methods impose substantial constraints on functional classes and distributions of causal processes for determining causal relationships; however, the constraints are restrictive and there is a lack of good understanding. Therefore, we introduce a new dynamical-system view for understanding the methods and their constraints by connecting optimal transport and causal discovery. Furthermore, we provide a causal discovery criterion and a robust optimal transport-based algorithm. In the third paper, the evaluation of causal discovery methods is discussed. While it is too simplistic to evaluate causal discovery methods on synthetic data generated from random causal graphs, the real-world benchmark data sets with ground-truth causal relations are in great demand and always include practical issues. Thus, we create a neuropathic pain diagnosis simulator based on real-world patient records and domain knowledge. The simulator provides ground-truth causal relations and generates simulation data that cannot be distinguished by the medical expert. Finally, we explored an application of causality: Fairness in machine learning. Many fairness works are based on the constraints of static statistical measures across different demographic groups. It turns out that decisions under such constraints can lead to a pernicious long-term impact on the disadvantaged group. Therefore, we consider the underlying causal processes, theoretically analyze the equilibrium states of dynamical systems under various fairness constraints, show their impact on equilibrium states, and introduce potentially effective interventions to improve the equilibrium states. 

  Denna avhandling är EVENTUELLT nedladdningsbar som PDF. Kolla denna länk för att se om den går att ladda ner.