Aarhus University Seal

GPU-INSCY: A GPU-Parallel Algorithm and Tree Structure for Efficient Density-based Subspace Clustering

Research output: Contribution to conferencePaperResearchpeer-review

Documents

  • p31

    Final published version, 1.02 MB, PDF document

DOI

Subspace clustering is the task of grouping objects based on mutual similarity in subspaces of the full-dimensional space.
The INSCY algorithm extends the well-known density-based clustering algorithm DBSCAN. It finds dimensionality-unbiased non-redundant subspace clusters using a tree structure to speed up the processing of subspaces. Still, finding density-based clusters in all subspaces implies an exponential search space in the number of dimensions. Thus, the running time of INSCY is still measured in hours on even small datasets of 2000 points. For larger datasets, it becomes prohibitively expensive.

To benefit from INSCY for real-world sized datasets, we propose a novel GPU-parallel approach that runs on standard graphics cards. To utilize the many cores of the GPU, we need new algorithmic strategies that fit the computational model of the GPU. While the GPU provides a large number of threads, traditional algorithms incur diverging threads and poor memory alignment, both of which lead to idle time and poor runtime performance. In INSCY, extracting subspace regions from the SCY-tree structure and the density-based clustering of regions itself are thus unfit for the GPU.

Our novel GPU-friendly algorithm GPU-INSCY computes the same subspace clustering as INSCY at dramatically reduced runtimes.
To achieve this, we devise a restructured SCY-tree index-structure and associated operations for the GPU, as well as a GPU-parallel density-based subspace clustering.

We experimentally show that GPU-INSCY scales well with the size of the dataset and the number of dimensions, and improves the running time of INSCY by a factor of several thousand for large datasets of high dimensionality.
Original languageEnglish
Publication yearMar 2021
Number of pages12
DOIs
Publication statusPublished - Mar 2021
Event24th International Conference on Extending Database Technology -
Duration: 23 Mar 202126 Mar 2021
https://edbticdt2021.cs.ucy.ac.cy/

Conference

Conference24th International Conference on Extending Database Technology
Period23/03/202126/03/2021
Internet address

See relations at Aarhus University Citationformats

Download statistics

No data available

ID: 227959281