The ClusTree : indexing micro-clusters for anytime stream mining

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

Standard

The ClusTree : indexing micro-clusters for anytime stream mining. / Kranen, Philipp; Assent, Ira; Baldauf, Corinna; Seidl, Thomas.

In: Knowledge and Information Systems, Vol. 29, No. 2, 2011, p. 249-272.

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

Harvard

Kranen, P, Assent, I, Baldauf, C & Seidl, T 2011, 'The ClusTree : indexing micro-clusters for anytime stream mining', Knowledge and Information Systems, vol. 29, no. 2, pp. 249-272. https://doi.org/10.1007/s10115-010-0342-8

APA

Kranen, P., Assent, I., Baldauf, C., & Seidl, T. (2011). The ClusTree : indexing micro-clusters for anytime stream mining. Knowledge and Information Systems, 29(2), 249-272. https://doi.org/10.1007/s10115-010-0342-8

CBE

Kranen P, Assent I, Baldauf C, Seidl T. 2011. The ClusTree : indexing micro-clusters for anytime stream mining. Knowledge and Information Systems. 29(2):249-272. https://doi.org/10.1007/s10115-010-0342-8

MLA

Kranen, Philipp et al. "The ClusTree : indexing micro-clusters for anytime stream mining". Knowledge and Information Systems. 2011, 29(2). 249-272. https://doi.org/10.1007/s10115-010-0342-8

Vancouver

Kranen P, Assent I, Baldauf C, Seidl T. The ClusTree : indexing micro-clusters for anytime stream mining. Knowledge and Information Systems. 2011;29(2):249-272. https://doi.org/10.1007/s10115-010-0342-8

Author

Kranen, Philipp ; Assent, Ira ; Baldauf, Corinna ; Seidl, Thomas. / The ClusTree : indexing micro-clusters for anytime stream mining. In: Knowledge and Information Systems. 2011 ; Vol. 29, No. 2. pp. 249-272.

Bibtex

@article{56db68d827fb4960973388db3ad801a7,
title = "The ClusTree : indexing micro-clusters for anytime stream mining",
abstract = "Clustering streaming data requires algorithms that are capable of updating clustering results for the incoming data. As data is constantly arriving, time for processing is limited. Clustering has to be performed in a single pass over the incoming data and within the possibly varying inter-arrival times of the stream. Likewise, memory is limited, making it impossible to store all data. For clustering, we are faced with the challenge of maintaining a current result that can be presented to the user at any given time. In this work, we propose a parameter-free algorithm that automatically adapts to the speed of the data stream. It makes best use of the time available under the current constraints to provide a clustering of the objects seen up to that point. Our approach incorporates the age of the objects to reflect the greater importance of more recent data. For efficient and effective handling, we introduce the ClusTree, a compact and self-adaptive index structure for maintaining stream summaries. Additionally we present solutions to handle very fast streams through aggregation mechanisms and propose novel descent strategies that improve the clustering result on slower streams as long as time permits. Our experiments show that our approach is capable of handling a multitude of different stream characteristics for accurate and scalable anytime stream clustering. ",
author = "Philipp Kranen and Ira Assent and Corinna Baldauf and Thomas Seidl",
year = "2011",
doi = "10.1007/s10115-010-0342-8",
language = "English",
volume = "29",
pages = "249--272",
journal = "Knowledge and Information Systems",
issn = "0219-1377",
publisher = "Springer U K",
number = "2",

}

RIS

TY - JOUR

T1 - The ClusTree : indexing micro-clusters for anytime stream mining

AU - Kranen, Philipp

AU - Assent, Ira

AU - Baldauf, Corinna

AU - Seidl, Thomas

PY - 2011

Y1 - 2011

N2 - Clustering streaming data requires algorithms that are capable of updating clustering results for the incoming data. As data is constantly arriving, time for processing is limited. Clustering has to be performed in a single pass over the incoming data and within the possibly varying inter-arrival times of the stream. Likewise, memory is limited, making it impossible to store all data. For clustering, we are faced with the challenge of maintaining a current result that can be presented to the user at any given time. In this work, we propose a parameter-free algorithm that automatically adapts to the speed of the data stream. It makes best use of the time available under the current constraints to provide a clustering of the objects seen up to that point. Our approach incorporates the age of the objects to reflect the greater importance of more recent data. For efficient and effective handling, we introduce the ClusTree, a compact and self-adaptive index structure for maintaining stream summaries. Additionally we present solutions to handle very fast streams through aggregation mechanisms and propose novel descent strategies that improve the clustering result on slower streams as long as time permits. Our experiments show that our approach is capable of handling a multitude of different stream characteristics for accurate and scalable anytime stream clustering.

AB - Clustering streaming data requires algorithms that are capable of updating clustering results for the incoming data. As data is constantly arriving, time for processing is limited. Clustering has to be performed in a single pass over the incoming data and within the possibly varying inter-arrival times of the stream. Likewise, memory is limited, making it impossible to store all data. For clustering, we are faced with the challenge of maintaining a current result that can be presented to the user at any given time. In this work, we propose a parameter-free algorithm that automatically adapts to the speed of the data stream. It makes best use of the time available under the current constraints to provide a clustering of the objects seen up to that point. Our approach incorporates the age of the objects to reflect the greater importance of more recent data. For efficient and effective handling, we introduce the ClusTree, a compact and self-adaptive index structure for maintaining stream summaries. Additionally we present solutions to handle very fast streams through aggregation mechanisms and propose novel descent strategies that improve the clustering result on slower streams as long as time permits. Our experiments show that our approach is capable of handling a multitude of different stream characteristics for accurate and scalable anytime stream clustering.

U2 - 10.1007/s10115-010-0342-8

DO - 10.1007/s10115-010-0342-8

M3 - Journal article

VL - 29

SP - 249

EP - 272

JO - Knowledge and Information Systems

JF - Knowledge and Information Systems

SN - 0219-1377

IS - 2

ER -