Complex Event Processing under Uncertainty in RDF Stream Processing

Sammanfattning: The Semantic Web provides a framework for representing, sharing, and integrating data on the Web using a set of specifications promoted by the World Wide Web Consortium (W3C). These specifications include RDF as the model for data inter-change on the Web and languages (e.g., RDFS and OWL) for defining schemas and ontologies. While the Semantic Web has traditionally focused on static or slowly changing data, information on the Web is becoming increasingly dynamic, with sources such as Internet-of-Things devices, sensor networks, smart cities, social me-dia, and more. RDF Stream Processing (RSP) extends Semantic Web technologies to support streaming data and continuous queries and has been suggested as a candidate for bridging the gap between Complex Event Processing (CEP), which focuses on identifying meaningful events and event patterns from streaming data, and the Semantic Web standards. Systems that operate on real-world data must often deal with uncertainty, which can arise from, for example, missing information, incomplete domain knowledge, sensor noise, or linguistic vagueness. Uncertainty has received attention in both Semantic Web and CEP research, but little is known about how it can be managed in RSP and how it might impact performance. The contributions of this thesis are threefold. First, the issue of supporting a general model of CEP in RSP is addressed. A set of requirements for CEP is identified and used to define an event ontology for use in RSP. An approach is then proposed for creating a CEP framework that can scale processing beyond the limitations of a single RSP instance. Second, an extension of the RSP-QL data model is defined for representation of statement-level annotations. The data model is then used as a basis for capturing different types of uncertainty in a use case inspired by a research project in electronic healthcare. Finally, the performance impact of explicitly managing different types of uncertainty is evaluated in a prototype implementation and a set of optimization strategies is introduced with a goal of reducing the impact of uncertainty on query execution performance. The results show that the proposed approach to representing statement-level metadata reduces required data transfer bandwidth and that it can improve query execution performance com-pared with using RDF reification. The optimization strategies produce improved query execution performance overall, but the impact of the heuristic depends on multiple factors, including the selectivity of filters, join cardinalities, and the cost of evaluating uncertainty functions.

  Denna avhandling är EVENTUELLT nedladdningsbar som PDF. Kolla denna länk för att se om den går att ladda ner.