Ontology-Driven Data Access and Data Integration with an Application in the Materials Design Domain

Sammanfattning: The Semantic Web aims to make data on the web machine-readable by introducing semantics to the data. Ontologies are one of the critical technologies in the Semantic Web. Ontologies, which provide a formal definition of a domain of interest, can play an important role in enabling semantics-aware data access and data integration over heterogeneous data sources. Traditionally, ontology-based data access and integration methods focus on data that follows relational data models. However, in some domains, such as materials design, the models that data follows and the methods by which it is shared differ today. Data may be based on different data models (i.e., relational models and non-relational models) and may be shared in different ways (e.g., as tabular data via SQL queries or API (Application Programming Interface) requests, or as JSON-formatted data via API requests). To address these challenges, conventional ontology-based data access and integration approaches must be adapted. The recently developed GraphQL, a framework for building APIs, is an interesting candidate for providing such an approach, although the use of GraphQL for integration has not yet been studied.In this thesis, we propose a GraphQL-based framework for data access and integration. As part of this framework, we propose and implement a novel approach that enables automatic generation of GraphQL servers based on ontologies rather than building them from scratch. The framework is evaluated via experiments based on a synthetic benchmark dataset. Further, we utilize the field of materials design as a target domain to evaluate the feasibility of our framework by showing the use of the framework for the Open Databases Integration for Materials Design (OPTIMADE), which is a community effort aiming to develop a specification for a common API to make materials databases interoperable. At the beginning of this work, no ontologies existed for the domain of computational materials databases. As our approach requires the use of an ontology, we developed one: the Materials Design Ontology (MDO). Furthermore, when new databases are added or new kinds of data are added to existing databases, the coverage of the ontology driving the GraphQL server generation may need to be enlarged. Therefore, we study how ontologies can be extended and propose an approach based on phrase-based topic modeling, formal topical concept analysis and domain expert validation. In addition to extending MDO, we also use this approach to extend two ontologies in the nanotechnology domain.

  Denna avhandling är EVENTUELLT nedladdningsbar som PDF. Kolla denna länk för att se om den går att ladda ner.