Aarhus University Seal / Aarhus Universitets segl

Evaluating Clustering in Subspace Projections of High Dimensional Data

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperConference articleResearchpeer-review

Standard

Evaluating Clustering in Subspace Projections of High Dimensional Data. / Müller, Emmanuel; Günnemann, Stephan; Assent, Ira; Seidl, Thomas.

In: V L D B Journal, No. 1, 2009, p. 1270-1281.

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperConference articleResearchpeer-review

Harvard

APA

CBE

Müller E, Günnemann S, Assent I, Seidl T. 2009. Evaluating Clustering in Subspace Projections of High Dimensional Data. V L D B Journal. (1):1270-1281.

MLA

Müller, Emmanuel et al. "Evaluating Clustering in Subspace Projections of High Dimensional Data". V L D B Journal. 2009, (1). 1270-1281.

Vancouver

Müller E, Günnemann S, Assent I, Seidl T. Evaluating Clustering in Subspace Projections of High Dimensional Data. V L D B Journal. 2009;(1):1270-1281.

Author

Müller, Emmanuel ; Günnemann, Stephan ; Assent, Ira ; Seidl, Thomas. / Evaluating Clustering in Subspace Projections of High Dimensional Data. In: V L D B Journal. 2009 ; No. 1. pp. 1270-1281.

Bibtex

@inproceedings{2c5297966d35473c8d7015cb50f039fd,
title = "Evaluating Clustering in Subspace Projections of High Dimensional Data",
abstract = "Clustering high dimensional data is an emerging research field. Subspace clustering or projected clustering group similar objects in subspaces, i.e. projections, of the full space. In the past decade, several clustering paradigms have been developed in parallel, without thorough evaluation and comparison between these paradigms on a common basis.Conclusive evaluation and comparison is challenged by three major issues. First, there is no ground truth that describes the {"}true{"} clusters in real world data. Second, a large variety of evaluation measures have been used that reflect different aspects of the clustering result. Finally, in typical publications authors have limited their analysis to their favored paradigm only, while paying other paradigms little or no attention.In this paper, we take a systematic approach to evaluate the major paradigms in a common framework. We study representative clustering algorithms to characterize the different aspects of each paradigm and give a detailed comparison of their properties. We provide a benchmark set of results on a large variety of real world and synthetic data sets. Using different evaluation measures, we broaden the scope of the experimental analysis and create a common baseline for future developments and comparable evaluations in the field. For repeatability, all implementations, data sets and evaluation measures are available on our website.",
author = "Emmanuel M{\"u}ller and Stephan G{\"u}nnemann and Ira Assent and Thomas Seidl",
note = "Serie: Proceedings of the VLDB Endowment, VLDB, 1, 2, 2150-8097 Volumne: 2",
year = "2009",
language = "English",
pages = "1270--1281",
journal = "V L D B Journal",
issn = "1066-8888",
publisher = "Springer Berlin Heidelberg",
number = "1",

}

RIS

TY - GEN

T1 - Evaluating Clustering in Subspace Projections of High Dimensional Data

AU - Müller, Emmanuel

AU - Günnemann, Stephan

AU - Assent, Ira

AU - Seidl, Thomas

N1 - Serie: Proceedings of the VLDB Endowment, VLDB, 1, 2, 2150-8097 Volumne: 2

PY - 2009

Y1 - 2009

N2 - Clustering high dimensional data is an emerging research field. Subspace clustering or projected clustering group similar objects in subspaces, i.e. projections, of the full space. In the past decade, several clustering paradigms have been developed in parallel, without thorough evaluation and comparison between these paradigms on a common basis.Conclusive evaluation and comparison is challenged by three major issues. First, there is no ground truth that describes the "true" clusters in real world data. Second, a large variety of evaluation measures have been used that reflect different aspects of the clustering result. Finally, in typical publications authors have limited their analysis to their favored paradigm only, while paying other paradigms little or no attention.In this paper, we take a systematic approach to evaluate the major paradigms in a common framework. We study representative clustering algorithms to characterize the different aspects of each paradigm and give a detailed comparison of their properties. We provide a benchmark set of results on a large variety of real world and synthetic data sets. Using different evaluation measures, we broaden the scope of the experimental analysis and create a common baseline for future developments and comparable evaluations in the field. For repeatability, all implementations, data sets and evaluation measures are available on our website.

AB - Clustering high dimensional data is an emerging research field. Subspace clustering or projected clustering group similar objects in subspaces, i.e. projections, of the full space. In the past decade, several clustering paradigms have been developed in parallel, without thorough evaluation and comparison between these paradigms on a common basis.Conclusive evaluation and comparison is challenged by three major issues. First, there is no ground truth that describes the "true" clusters in real world data. Second, a large variety of evaluation measures have been used that reflect different aspects of the clustering result. Finally, in typical publications authors have limited their analysis to their favored paradigm only, while paying other paradigms little or no attention.In this paper, we take a systematic approach to evaluate the major paradigms in a common framework. We study representative clustering algorithms to characterize the different aspects of each paradigm and give a detailed comparison of their properties. We provide a benchmark set of results on a large variety of real world and synthetic data sets. Using different evaluation measures, we broaden the scope of the experimental analysis and create a common baseline for future developments and comparable evaluations in the field. For repeatability, all implementations, data sets and evaluation measures are available on our website.

M3 - Conference article

SP - 1270

EP - 1281

JO - V L D B Journal

JF - V L D B Journal

SN - 1066-8888

IS - 1

ER -