Synchronization-based clustering on evolving data stream

Publikation: Bidrag til tidsskrift/Konferencebidrag i tidsskrift /Bidrag til avisTidsskriftartikelForskningpeer review

DOI

  • Junming Shao, University of Electronic Science and Technology of China
  • ,
  • Yue Tan, University of Electronic Science and Technology of China
  • ,
  • Lianli Gao, University of Electronic Science and Technology of China
  • ,
  • Qinli Yang, University of Electronic Science and Technology of China
  • ,
  • Claudia Plant, University of Vienna
  • ,
  • Ira Assent

Clustering streams of data is of increasing importance in many applications. In this paper, we propose a new synchronization-based clustering approach for evolving data streams, called SyncTree, which maintains all micro-clusters at different levels of granularity depending upon the data recency. Instead of using a sliding window or decay function to focus on recent data, SyncTree summarizes all continuously-arriving objects as synchronized micro-clusters sequentially in a batch fashion. Owing to the powerful concept of synchronization, the derived micro-clusters truly reflect the intrinsic cluster structure rather than summarize statistics of data, and old micro-clusters can be intuitively summarized at a higher level by iterative clustering to fit memory constraints. Building upon the hierarchical micro-clusters, SyncTree allows investigating the cluster structure of the data stream between any two time stamps in the past, and also provides a principled way to analyze the cluster evolution. Empirical results demonstrate that our method has good performance compared to state-of-the-art algorithms.

OriginalsprogEngelsk
TidsskriftInformation Sciences
Vol/bind501
Sider (fra-til)573-587
Antal sider15
ISSN0020-0255
DOI
StatusUdgivet - okt. 2019

Se relationer på Aarhus Universitet Citationsformater

ID: 138793709