Why Throughput Isn't Everything: The Case of Parallelizing Skyline Queries

Activity: Presentations, memberships, employment, ownership and other activitiesLecture and oral contribution

Description

The extreme parallelism available in modern hardware suggests a way to combat the Big Data deluge. However, harnessing the potential parallelism can be quite challenging for many data management problems. The skyline query, which filters an input dataset to only the most salient points therein, is one such example. We see that sophisticated, single-threaded algorithms can outperform high-throughput parallel algorithms by orders-of-magnitude, even when the parallel algorithms are run on state-of-the-art graphics processing cards (GPUs) with 2680 physical cores. In this talk, I discuss how considering work-efficiency---the idea that parallel algorithms must be clever, too, even at the expense of throughput---can lead to algorithms that drastically outperform both sequential and massive-throughput competitors. The material is based on a paper we presented at ICDE 2015 (regarding multicore CPUs) and a paper that will be presented at VLDB 2015 (that focuses on the case of GPUs). At the end of the talk, I will discuss how these challenges again manifest themselves in some ongoing work on clustering natural language in social media.
Period10 Sept 2015
Event titleWhy Throughput Isn't Everything: The Case of Parallelizing Skyline Queries
Event typeSeminar
LocationBurnaby, CanadaShow on map

Keywords

  • parallelism
  • algorithms
  • skyline
  • work-efficiency
  • throughput