EGG-SynC: Exact GPU-parallelized Grid-based Clustering by Synchronization

Jakob Rødsgaard Jørgensen, Ira Assent

Publikation: Bidrag til bog/antologi/rapport/proceedingKonferencebidrag i proceedingsForskningpeer review

Abstract

Clustering by synchronization (SynC) is a clustering method that is motivated by the natural phenomena of synchronization and is based on the Kuramoto model.
The idea is to iteratively drag similar objects closer to each other until they have synchronized.
SynC has been adapted to solve several well-known data mining tasks such as subspace clustering, hierarchical clustering, and streaming clustering. This shows that the SynC model is very versatile.
Sadly, SynC has an O(T * n^2 * d) complexity, which makes it impractical for larger datasets.
E.g. Chen et al., shows runtimes of more than 10 hours for just n=70,000 data points, but improves this to just above one hour by using Rtrees in their method FSynC. Both are still impractical in real-life scenarios.
Furthermore, SynC uses a termination criterion that brings no guarantees that the points have synchronized, but instead just stops when most points are close to synchronizing.

In this paper, our contributions are manyfold.
We propose a new termination criterion that guarantees that all points have synchronized.
To achieve a much-needed reduction in runtime, we propose a strategy to summarize partitions of the data into a grid structure, a GPU-friendly grid structure to support this and neighborhood queries, and a GPU-parallelized algorithm for clustering by synchronization (EGG-SynC) that utilize these ideas.
Furthermore, we provide an extensive evaluation against state-of-the-art showing 2 to 3 orders of magnitude speedup compared to SynC and FSynC.
OriginalsprogEngelsk
TitelProceedings 26th International Conference on Extending Database Technology ( EDBT 2023 )
Antal sider13
Forlagopenproceedings.org
Publikationsdato2023
Sider195-207
ISBN (Elektronisk)978-3-89318-088-2
DOI
StatusUdgivet - 2023
BegivenhedEDBT 2023: 26th International Conference on Extending Database Technology - Ioannina, Grækenland
Varighed: 28 mar. 202331 mar. 2023

Konference

KonferenceEDBT 2023: 26th International Conference on Extending Database Technology
Land/OmrådeGrækenland
ByIoannina
Periode28/03/202331/03/2023
NavnAdvances in Database Technology
Vol/bind26
ISSN2367-2005

Fingeraftryk

Dyk ned i forskningsemnerne om 'EGG-SynC: Exact GPU-parallelized Grid-based Clustering by Synchronization'. Sammen danner de et unikt fingeraftryk.

Citationsformater