EGG-SynC: Exact GPU-parallelized Grid-based Clustering by Synchronization

Jakob Rødsgaard Jørgensen, Ira Assent

Research output: Contribution to book/anthology/report/proceedingArticle in proceedingsResearchpeer-review


Clustering by synchronization (SynC) is a clustering method that is motivated by the natural phenomena of synchronization and is based on the Kuramoto model.
The idea is to iteratively drag similar objects closer to each other until they have synchronized.
SynC has been adapted to solve several well-known data mining tasks such as subspace clustering, hierarchical clustering, and streaming clustering. This shows that the SynC model is very versatile.
Sadly, SynC has an O(T * n^2 * d) complexity, which makes it impractical for larger datasets.
E.g. Chen et al., shows runtimes of more than 10 hours for just n=70,000 data points, but improves this to just above one hour by using Rtrees in their method FSynC. Both are still impractical in real-life scenarios.
Furthermore, SynC uses a termination criterion that brings no guarantees that the points have synchronized, but instead just stops when most points are close to synchronizing.

In this paper, our contributions are manyfold.
We propose a new termination criterion that guarantees that all points have synchronized.
To achieve a much-needed reduction in runtime, we propose a strategy to summarize partitions of the data into a grid structure, a GPU-friendly grid structure to support this and neighborhood queries, and a GPU-parallelized algorithm for clustering by synchronization (EGG-SynC) that utilize these ideas.
Furthermore, we provide an extensive evaluation against state-of-the-art showing 2 to 3 orders of magnitude speedup compared to SynC and FSynC.
Original languageEnglish
Title of host publicationProceedings 26th International Conference on Extending Database Technology ( EDBT 2023 )
Number of pages13
Publication date2023
ISBN (Electronic)978-3-89318-088-2
Publication statusPublished - 2023
EventEDBT 2023: 26th International Conference on Extending Database Technology - Ioannina, Greece
Duration: 28 Mar 202331 Mar 2023


ConferenceEDBT 2023: 26th International Conference on Extending Database Technology
SeriesAdvances in Database Technology


Dive into the research topics of 'EGG-SynC: Exact GPU-parallelized Grid-based Clustering by Synchronization'. Together they form a unique fingerprint.

Cite this