Z-Series : Mining and learning from complex sequential data

Sammanfattning: The amount and complexity of sequential data collected across various domains have grown rapidly, posing significant challenges for extracting useful knowledge from such data sources. The complexity arises from diverse sequence representations with varying granularities, such as multivariate time series, histogram snapshots, and heterogeneous health records, which often describe a single data instance with multiple sequences. Due to this complexity, the underlying temporal relations between sequences may not be clear and can change over time, making knowledge discovery even more challenging.To address these challenges, this thesis proposes event intervals as a unified representation for complex sequential data. Event intervals capture the underlying temporal relations between sequences by comparing the relative locations of event intervals in both the time and value dimensions, making them suitable for describing diverse sequential data. The proposed artifacts aim to efficiently and effectively discover patterns of interest, transform sequential data in different application domains through temporal abstraction, and provide interpretable features for machine learning tasks without compromising performance. The effectiveness of the proposed artifacts is evaluated through empirical experiments and practical evaluations, which demonstrate their applicability and performance. The thesis is structured into three parts. First, it introduces state-of-the-art frameworks for mining event interval sequences, including frequent arrangement mining, classification, and clustering. The utility of these frameworks is demonstrated through comparative empirical evaluations against other frameworks. Second, the thesis applies temporal abstraction to complex sequential data in different application domains, showcasing its applicability through tasks such as disproportionality analysis and local grouping detection for time series. Lastly, event intervals are used as interpretable features for learning tasks, outperforming competitive algorithms using different feature representations. This part focuses on univariate and multivariate time series, and extensive experiments are performed on the publicly available benchmark datasets with statistical tests.

  KLICKA HÄR FÖR ATT SE AVHANDLINGEN I FULLTEXT. (PDF-format)