Synchronization-based clustering on evolving data stream

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

  • Junming Shao, University of Electronic Science and Technology of China
  • ,
  • Yue Tan, University of Electronic Science and Technology of China
  • ,
  • Lianli Gao, University of Electronic Science and Technology of China
  • ,
  • Qinli Yang, University of Electronic Science and Technology of China
  • ,
  • Claudia Plant, University of Vienna
  • ,
  • Ira Assent

Clustering streams of data is of increasing importance in many applications. In this paper, we propose a new synchronization-based clustering approach for evolving data streams, called SyncTree, which maintains all micro-clusters at different levels of granularity depending upon the data recency. Instead of using a sliding window or decay function to focus on recent data, SyncTree summarizes all continuously-arriving objects as synchronized micro-clusters sequentially in a batch fashion. Owing to the powerful concept of synchronization, the derived micro-clusters truly reflect the intrinsic cluster structure rather than summarize statistics of data, and old micro-clusters can be intuitively summarized at a higher level by iterative clustering to fit memory constraints. Building upon the hierarchical micro-clusters, SyncTree allows investigating the cluster structure of the data stream between any two time stamps in the past, and also provides a principled way to analyze the cluster evolution. Empirical results demonstrate that our method has good performance compared to state-of-the-art algorithms.

Original languageEnglish
JournalInformation Sciences
ISSN0020-0255
DOIs
Publication statusAccepted/In press - 24 Jan 2019

    Research areas

  • Clustering, Data stream, Evolving analysis, Synchronization

See relations at Aarhus University Citationformats

ID: 138793709