Diversified Retrieval of Spatial Data with Context

Sammanfattning: The abundance and ubiquity of spatial datasets necessitates their effective and efficient retrieval. For instance, on the web, there are datasets with GIS objects or POIs (e.g. Spatialhadoop datasets), datasets with geo-tagged photographs (e.g. Flickr), online social networks (e.g. Facebook), and semantic knowledge graphs (e.g. Yago, DBpedia). Various spatial keyword search paradigms have been proposed that facilitate their easy retrieval by liberating users from understanding the database schemas (e.g. relational, NoSQL, RDF) and query languages (e.g. SQL, SPARQL). In a spatial keyword search paradigm the user inputs a set of keywords and a query location. The results consist of entities in geographical proximity to the query location and relevant to the query keywords. However, such results considering only relevance may include entities that are very similar to each other, compromising the query effectiveness by becoming overwhelming.In view of this limitation, we propose contextual and spatial diversification by facilitating the retrieval of spatial entities with diverse characteristics and directions with respect to the query location. We formally define the problem that combines relevance and diversity and prove that finding the optimal solutions is an NP-hard problem. Thus, we propose two greedy approximate algorithms with guaranteed quality.Furthermore, we investigate the proportional retrieval of spatial data. We argue that objects with similar context and nearby locations should be proportionally represented in the selection as this can represent insightful information to the users about the whole population of candidate objects. Proportionality dictates the pairwise comparison of all retrieved objects and hence bears a high computational cost. To address this, we propose novel algorithms which greatly reduce the cost of proportional object selection in practice. In addition, we propose pre-processing, pruning and approximate computation techniques which, combined, reduce the computational cost of the algorithms even further.Extensive empirical studies on two real datasets show that our proposed algorithms can be very fast and give timely results of excellent approximation quality. User evaluation verifies that users prefer results based on proportionality then based on diversification and lastly on relevance.

  KLICKA HÄR FÖR ATT SE AVHANDLINGEN I FULLTEXT. (PDF-format)