Efficient Extraction of Content from Enriched Geospatial and Networked Data

Qiang Qu

Research output: Types of ThesisPhD thesis

Abstract

Social network services such as Google Places and Twitter have led to a proliferation of user-generated web content that is constantly shared among users. These services enable access to various types of content, covering geospatial locations, textual descriptions, social relationships, and so forth, which makes it possible to extract relevant and interesting information that can then be utilized in different applications. However, web content is often semantically rich, structurally complex, and highly dynamic. This dissertation addresses some of the challenges posed by the use of such data.

First, the dissertation investigates the extraction of relevant sets of objects from collections of geo-tagged web objects, such as business directory entries. The increasing availability of such objects gives prominence to location-based queries that consider both spatial and textual properties of objects. Two novel functions that both return sets of objects are presented. The first integrates spatial distance to a query location and textual relevance to query keywords into one ranking function. The second enables the specification of allying and alienating preferences on the textual (or non-spatial in general) properties of the objects, and it retrieves a set of objects that best satisfy the query. The dissertation covers application scenarios for each function, it presents efficient implementations, and it offers experimental findings with real-world data.

Second, the dissertation studies the problem of compressing weighted networks. Such networks are weighted graphs that model objects and their relationships and where weights indicate, for instance, importance. Methods are introduced that extract implicit structure in a weighted graph, representing this structure as a smaller generalized graph obtained by merging edges and nodes in the original graph. Generalized, compressed graphs provide a way to interpret large networks. The dissertation reports on studies that compare the proposed solutions with respect to their tradeoffs between result complexity and quality. The findings suggest that the solutions are able to efficiently produce good results for mining applications.

Finally, the dissertation investigates how to summarize dynamic diffusion processes in networks as the processes evolve, e.g., the diffusion of microblog posts in a social network. The summarization captures ‘interesting’ developments as they occur, in online fashion. The proposed OSNet framework adopts a spreading tree model and includes algorithms that enable efficient summarization in real time. Empirical studies show that OSNet is effective at summarizing diffusion processes into traceable summaries as they evolve.
Original languageEnglish
Publisher
Publication statusPublished - 29 Sept 2014

Keywords

  • Content extraction
  • Spatial data management
  • web content
  • social networks
  • data mining
  • graph mining
  • spatial keyword query
  • preference query
  • graph compression
  • graph summarization
  • dynamic networks
  • information propagation

Fingerprint

Dive into the research topics of 'Efficient Extraction of Content from Enriched Geospatial and Networked Data'. Together they form a unique fingerprint.

Cite this