AnyDBC: An efficient anytime density-based clustering algorithm for very large complex datasets

Research output: Contribution to book/anthology/report/proceedingArticle in proceedingsResearchpeer-review

DOI

The density-based clustering algorithm DBSCAN is a state of-the-art data clustering technique with numerous applications in many fields. However, its O(n2) time complexity still remains a severe weakness. In this paper, we propose a novel anytime approach to cope with this problem by reducing both the range query and the label propagation time of DBSCAN. Our algorithm, called AnyDBC, compresses the data into smaller density-connected subsets called primitive clusters and labels objects based on connected components of these primitive clusters for reducing the label propagation time. Moreover, instead of passively performing the range query for all objects like existing techniques, AnyDBC iteratively and actively learns the current cluster structure of the data and selects a few most promising objects for refining clusters at each iteration. Thus, in the end, it performs substantially fewer range queries compared to DBSCAN while still guaranteeing the exact final result of DBSCAN. Experiments show speedup factors of orders of magnitude compared to DBSCAN and its fastest variants on very large real and synthetic complex datasets.

Original languageEnglish
Title of host publicationKDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Number of pages10
PublisherAssociation for Computing Machinery
Publication year13 Aug 2016
Pages1025-1034
ISBN (Electronic)9781450342322
DOIs
Publication statusPublished - 13 Aug 2016
Event22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016 - San Francisco, United States
Duration: 13 Aug 201617 Aug 2016

Conference

Conference22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016
LandUnited States
BySan Francisco
Periode13/08/201617/08/2016
SponsorACM SIGKDD, ACM SIGMOD

    Research areas

  • Active learning, Anytime clustering, Density-based clustering

See relations at Aarhus University Citationformats

ID: 110155004