Efficient Data Streaming Analytic Designs for Parallel and Distributed Processing

Sammanfattning: Today, ubiquitously sensing technologies enable inter-connection of physical objects, as part of Internet of Things (IoT), and provide massive amounts of data streams. In such scenarios, the demand for timely analysis has resulted in a shift of data processing paradigms towards continuous, parallel, and multitier computing. However, these paradigms are followed by several challenges especially regarding analysis speed, precision, costs, and deterministic execution. This thesis studies a number of such challenges to enable efficient continuous processing of streams of data in a decentralized and timely manner. In the first part of the thesis, we investigate techniques aiming at speeding up the processing without a loss in precision. The focus is on continuous machine learning/data mining types of problems, appearing commonly in IoT applications, and in particular continuous clustering and monitoring, for which we present novel algorithms; (i) Lisco, a sequential algorithm to cluster data points collected by LiDAR (a distance sensor that creates a 3D mapping of the environment), (ii) p-Lisco, the parallel version of Lisco to enhance pipeline- and data-parallelism of the latter, (iii) pi-Lisco, the parallel and incremental version to reuse the information and prevent redundant computations, (iv) g-Lisco, a generalized version of Lisco to cluster any data with spatio-temporal locality by leveraging the implicit ordering of the data, and (v) Amble, a continuous monitoring solution in an industrial process. In the second part, we investigate techniques to reduce the analysis costs in addition to speeding up the processing while also supporting deterministic execution. The focus is on problems associated with availability and utilization of computing resources, namely reducing the volumes of data, involving concurrent computing elements, and adjusting the level of concurrency. For that, we propose three frameworks; (i) DRIVEN, a framework to continuously compress the data and enable efficient transmission of the compact data in the processing pipeline, (ii) STRATUM, a framework to continuously pre-process the data before transferring the later to upper tiers for further processing, and (iii) STRETCH, a framework to enable instantaneous elastic reconfigurations to adjust intra-node resources at runtime while ensuring determinism. The algorithms and frameworks presented in this thesis contribute to an efficient processing of data streams in an online manner while utilizing available resources. Using extensive evaluations, we show the efficiency and achievements of the proposed techniques for IoT representative applications that involve a wide spectrum of platforms, and illustrate that the performance of our work exceeds that of state-of-the-art techniques.

  Denna avhandling är EVENTUELLT nedladdningsbar som PDF. Kolla denna länk för att se om den går att ladda ner.