Seminar: Handling Skylines with Large Outputs

Activity: Presentations, memberships, employment, ownership and other activitiesLecture and oral contribution

Description

The skyline operator is a well-established tool for reducing an input dataset to points that are “most interesting.” It returns exactly those that have larger values in at least one dimension when compared to any other (distinct) data point in the dataset. Practically, these are the data points that maximize somebody’s preferences. However, the size of the skyline is bounded only by the input size, which has two immediate consequences. First, since the execution time for the skyline operator is primarily dependent on the output size, performance suffers. Second, no user wants the entire input returned as the output.

In this talk, I address these two concerns. By breaking from the traditional share-nothing paradigm for parallelizing skyline queries, we can introduce a multicore algorithm that is the first to consistently outperform state-of-the-art sequential computation. Next, I introduce a new means of selecting a small, fixed-size subset of the skyline that minimizes the “regret” that a top-k user would suffer if using the subset rather than the full skyline. To this end, I show intractability, efficient geometric techniques for two dimensions, and effective randomized techniques for the NP-hard general case. Finally, I conclude with some interesting open problems that have arisen from this work.

Invited by Aamir Cheema of Monash University, Melbourne.
Period10 Sept 2014
Held atUnknown external organisation