Inference of Effective Pairwise Relations for Data Processing

Sammanfattning: In various data science and artificial intelligence areas, representation learning is a performance-critical step. While different representation learning methods can detect different descriptive and latent features, many representation learning methods reflect on pairwise relations. The thesis consists of two parts, studying pairwise relations from two points of view: i) Pairwise relations between the states of a Markov chain. ii) Pairwise relations between objects in a dataset based on a desired (dis)similarity measure. In the first part of the thesis, we consider Markov chains, noting that pairwise relations between its states are naturally modeled by the state-transition matrix. We propose a method for modeling the performance of a synchronization method for a multi-processor architecture. Our model introduces and builds upon a cache line bouncing process that models the interaction of threads accessing the shared cache lines. In the second part of the thesis, we consider representation learning using the transitive-aware Minimax distance, which enables the extraction of elongated manifolds and structures in the data. While recent work has made Minimax distances computationally feasible, little attention has been put to its memory footprint, which is naturally O(N^2), the cost of storing all pairwise distances. We do, however, compute a novel hierarchical representation of the data, requiring O(N) memory, from which pairwise Minimax distances can then be efficiently inferred, in total requiring O(N) memory, at the cost of higher computational cost. An alternative sampling-based approach is also derived, which computes approximate Minimax distances, also in O(N) memory but with a significantly reduced computational cost, while still yielding a good approximation, as verified by impressive results on clustering benchmarks. Finally, we develop an unsupervised learning framework for clustering vehicle trajectories based on Minimax distances. The performance of the framework is validated on real-world datasets collected from real driving scenarios, on which satisfactory performance is demonstrated.

  KLICKA HÄR FÖR ATT SE AVHANDLINGEN I FULLTEXT. (PDF-format)