EGG-SynC: Exact GPU-parallelized Grid-based Clustering by Synchronization

Jakob Rødsgaard Jørgensen, Ira Assent

Research output: Contribution to book/anthology/report/proceedingArticle in proceedingsResearchpeer-review

Abstract

Clustering by synchronization (SynC) is a clustering method that is motivated by the natural phenomena of synchronization and is based on the Kuramoto model. The idea is to iteratively drag similar objects closer to each other until they have synchronized. SynC has been adapted to solve several well-known data mining tasks such as subspace clustering, hierarchical clustering, and streaming clustering. This shows that the SynC model is very versatile. Sadly, SynC has an O(T × n2 × d) complexity, which makes it impractical for larger datasets. E.g., Chen et al. [8] show runtimes of more than 10 hours for just n = 70, 000 data points, but improve this to just above one hour by using R-Trees in their method FSynC. Both are still impractical in real-life scenarios. Furthermore, SynC uses a termination criterion that brings no guarantees that the points have synchronized but instead just stops when most points are close to synchronizing. In this paper, our contributions are manifold. We propose a new termination criterion that guarantees that all points have synchronized. To achieve a much-needed reduction in runtime, we propose a strategy to summarize partitions of the data into a grid structure, a GPU-friendly grid structure to support this and neighborhood queries, and a GPU-parallelized algorithm for clustering by synchronization (EGG-SynC) that utilize these ideas. Furthermore, we provide an extensive evaluation against state-of-the-art showing 2 to 3 orders of magnitude speedup compared to SynC and FSynC.

Original languageEnglish
Title of host publicationProceedings 26th International Conference on Extending Database Technology ( EDBT 2023 )
Number of pages13
Publisheropenproceedings.org
Publication date2023
Pages195-207
ISBN (Electronic)978-3-89318-088-2
DOIs
Publication statusPublished - 2023
EventEDBT 2023: 26th International Conference on Extending Database Technology - Ioannina, Greece
Duration: 28 Mar 202331 Mar 2023

Conference

ConferenceEDBT 2023: 26th International Conference on Extending Database Technology
Country/TerritoryGreece
CityIoannina
Period28/03/202331/03/2023
SeriesAdvances in Database Technology
Volume26
ISSN2367-2005

Fingerprint

Dive into the research topics of 'EGG-SynC: Exact GPU-parallelized Grid-based Clustering by Synchronization'. Together they form a unique fingerprint.

Cite this