Why Throughput Isn't Everything: The Case of Parallelizing Skyline Queries

Aktivitet: Præsentationer, medlemskaber, ansættelser, ejerskab og andre aktiviteterForedrag og mundtlige bidrag

Beskrivelse

The extreme parallelism available in modern hardware suggests a way to combat the Big Data deluge. However, harnessing the potential parallelism can be quite challenging for many data management problems. The skyline query, which filters an input dataset to only the most salient points therein, is one such example. We see that sophisticated, single-threaded algorithms can outperform high-throughput parallel algorithms by orders-of-magnitude, even when the parallel algorithms are run on state-of-the-art graphics processing cards (GPUs) with 2680 physical cores. In this talk, I discuss how considering work-efficiency---the idea that parallel algorithms must be clever, too, even at the expense of throughput---can lead to algorithms that drastically outperform both sequential and massive-throughput competitors. The material is based on a paper we presented at ICDE 2015 (regarding multicore CPUs) and a paper that will be presented at VLDB 2015 (that focuses on the case of GPUs). At the end of the talk, I will discuss how these challenges again manifest themselves in some ongoing work on clustering natural language in social media.
Periode10 sep. 2015
BegivenhedstitelWhy Throughput Isn't Everything: The Case of Parallelizing Skyline Queries
BegivenhedstypeSeminar
PlaceringBurnaby, CanadaVis på kort