Complex Event Processing under Uncertainty in RDF Stream Processing
Sammanfattning: The Semantic Web provides a framework for representing, sharing, and integrating data on the Web using a set of speciﬁcations promoted by the World Wide Web Consortium (W3C). These speciﬁcations include RDF as the model for data inter-change on the Web and languages (e.g., RDFS and OWL) for deﬁning schemas and ontologies. While the Semantic Web has traditionally focused on static or slowly changing data, information on the Web is becoming increasingly dynamic, with sources such as Internet-of-Things devices, sensor networks, smart cities, social me-dia, and more. RDF Stream Processing (RSP) extends Semantic Web technologies to support streaming data and continuous queries and has been suggested as a candidate for bridging the gap between Complex Event Processing (CEP), which focuses on identifying meaningful events and event patterns from streaming data, and the Semantic Web standards. Systems that operate on real-world data must often deal with uncertainty, which can arise from, for example, missing information, incomplete domain knowledge, sensor noise, or linguistic vagueness. Uncertainty has received attention in both Semantic Web and CEP research, but little is known about how it can be managed in RSP and how it might impact performance. The contributions of this thesis are threefold. First, the issue of supporting a general model of CEP in RSP is addressed. A set of requirements for CEP is identiﬁed and used to deﬁne an event ontology for use in RSP. An approach is then proposed for creating a CEP framework that can scale processing beyond the limitations of a single RSP instance. Second, an extension of the RSP-QL data model is deﬁned for representation of statement-level annotations. The data model is then used as a basis for capturing diﬀerent types of uncertainty in a use case inspired by a research project in electronic healthcare. Finally, the performance impact of explicitly managing diﬀerent types of uncertainty is evaluated in a prototype implementation and a set of optimization strategies is introduced with a goal of reducing the impact of uncertainty on query execution performance. The results show that the proposed approach to representing statement-level metadata reduces required data transfer bandwidth and that it can improve query execution performance com-pared with using RDF reiﬁcation. The optimization strategies produce improved query execution performance overall, but the impact of the heuristic depends on multiple factors, including the selectivity of ﬁlters, join cardinalities, and the cost of evaluating uncertainty functions.
Denna avhandling är EVENTUELLT nedladdningsbar som PDF. Kolla denna länk för att se om den går att ladda ner.