Together We Learn More : Algorithms and Applications for User-Centric Anomaly Detection

Sammanfattning: Anomaly detection is the problem of identifying data points or patterns that do not conform to normal behavior. Anomalies in data often correspond to important and actionable information such as frauds in financial applications, faults in production units, intrusions in computer systems, and serious diseases in patient records. One of the fundamental challenges of anomaly detection is that the exact notion of anomaly is subjective and varies greatly in different applications and domains. This makes distinguishing anomalies that match with the end-user's expectations from other observations difficult. As a result, anomaly detectors produce many false alarms that do not correspond to semantically meaningful anomalies for the analyst. Humans can help, in different ways, to bridge this gap between detected anomalies and ''anomalies-of-interest'': by giving clues on features more likely to reveal interesting anomalies or providing feedback to separate them from irrelevant ones. However, it is not realistic to assume a human to easily provide feedback without explaining why the algorithm classifies a certain sample as an anomaly. Interpretability of results is crucial for an analyst to be able to investigate the candidate anomaly and decide whether it is actually interesting or not. In this thesis, we take a step forward to improve the practical use of anomaly detection in real-life by leveraging human-algorithm collaboration. This thesis and appended papers study the problem of formulating and implementing algorithms for user-centric anomaly detection-- a setting in which people analyze, interpret, and learn from the detector's results, as well as provide domain knowledge or feedback. Throughout this thesis, we have described a number of diverse approaches, each addressing different challenges and needs of user-centric anomaly detection in the real world, and combined these methods into a coherent framework. By conducting different studies, this thesis finds that a comprehensive approach incorporating human knowledge and providing interpretable results can lead to more effective and practical anomaly detection and more successful real-world applications. The major contributions that result from the studies included in this work and led the above conclusion can be summarized into five categories: (1) exploring different data representations that are suitable for anomaly detection based on data characteristics and domain knowledge, (2) discovering patterns and groups in data that describe normal behavior in the current application, (3) implementing a generic and extensible framework enabling use-case-specific detectors suitable for different scenarios, (4) incorporating domain knowledge and expert feedback into anomaly detection, and (5) producing interpretable detection results that support end-users in understanding and validating the anomalies. 

  KLICKA HÄR FÖR ATT SE AVHANDLINGEN I FULLTEXT. (PDF-format)