Indexing, Query Processing, and Clustering of Spatio-Temporal Text Objects

Research output: Book/anthology/dissertation/reportPh.D. thesis


With the increasing mobile use of the web from geo-positioned devices, the
Internet is increasingly acquiring a spatial aspect, with still more types of
content being geo-tagged. As a result of this development, a wide range of
location-aware queries and applications have emerged. The large amounts of
data available coupled with the increasing number of location-aware queries
calls for efficient indexing and query processing techniques.
This dissertation investigates how to manage geo-tagged text content to
support these workloads in three specific areas: (i) grouping of spatio-textual
objects, (ii) spatio-temporal aggregates, and (iii) spatio-textual region querying
without special purpose index structures.
First, two novel techniques to perform grouping of spatio-textual objects
are presented. In the first technique, top-k groups of objects are returned
while taking into account aspects such as group density, distance to the query,
and relevance to the query keywords. The nodes of an R-tree are extended
with compressed histograms that represent the objects contained in their subtree.
Results of empirical studies show that the approach is viable in practical
settings. In the second technique, the grouping of spatio-textual objects is
done without considering query locations, and a clustering approach is proposed
that takes into account both the spatial and textual attributes of the
objects. The technique expands clusters based on a proposed quality function
that enables clusters of arbitrary shape and density. Empirical studies show
that the approach is effective at discovering real-world points of interest.
Second, an extension of static frequent item counting techniques is proposed
to enable the processing of vocabularies that change considerably over
time. The proposed techniques adaptively maintain the most frequent items
with exact counts rather than approximations at varying spatial and temporal
granularities to support top-k spatio-temporal term queries. Studies show
that the proposed techniques excel under update and query-intensive loads.
Finally, this dissertation investigates a technique to perform spatio-textual
region queries without the use of special purpose index structures. Spatiotextual
objects are encoded into bit strings with a spatial and textual part
that may be indexed using any standard DBMS. A query processing algorithm
is proposed that provides an exact top-k result by merging partial results.
The results shows excellent indexing and query execution performance on a
standard DBMS
Original languageEnglish
PublisherDepartment of Computer Science, Aarhus University
Number of pages171
Publication statusPublished - 31 Jul 2014

See relations at Aarhus University Citationformats

Download statistics

No data available

ID: 79505646