Clustering high dimensional data

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

Standard

Clustering high dimensional data. / Assent, Ira.

In: Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 2, No. 4, 2012, p. 340-350.

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

Harvard

Assent, I 2012, 'Clustering high dimensional data', Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 2, no. 4, pp. 340-350. https://doi.org/10.1002/widm.1062

APA

Assent, I. (2012). Clustering high dimensional data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(4), 340-350. https://doi.org/10.1002/widm.1062

CBE

Assent I. 2012. Clustering high dimensional data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2(4):340-350. https://doi.org/10.1002/widm.1062

MLA

Assent, Ira. "Clustering high dimensional data". Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2012, 2(4). 340-350. https://doi.org/10.1002/widm.1062

Vancouver

Assent I. Clustering high dimensional data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2012;2(4):340-350. https://doi.org/10.1002/widm.1062

Author

Assent, Ira. / Clustering high dimensional data. In: Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2012 ; Vol. 2, No. 4. pp. 340-350.

Bibtex

@article{a14c4bbb7ea3445494841528304b5da2,
title = "Clustering high dimensional data",
abstract = "High-dimensional data, i.e., data described by a large number of attributes, pose specific challenges to clustering. The so-called {\textquoteleft}curse of dimensionality{\textquoteright}, coined originally to describe the general increase in complexity of various computational problems as dimensionality increases, is known to render traditional clustering algorithms ineffective. The curse of dimensionality, among other effects, means that with increasing number of dimensions, a loss of meaningful differentiation between similar and dissimilar objects is observed. As high-dimensional objects appear almost alike, new approaches for clustering are required. Consequently, recent research has focused on developing techniques and clustering algorithms specifically for high-dimensional data. Still, open research issues remain. Clustering is a data mining task devoted to the automatic grouping of data based on mutual similarity. Each cluster groups objects that are similar to one another, whereas dissimilar objects are assigned to different clusters, possibly separating out noise. In this manner, clusters describe the data structure in an unsupervised manner, i.e., without the need for class labels. A number of clustering paradigms exist that provide different cluster models and different algorithmic approaches for cluster detection. Common to all approaches is the fact that they require some underlying assessment of similarity between data objects. In this article, we provide an overview of the effects of high-dimensional spaces, and their implications for different clustering paradigms. We review models and algorithms that address clustering in high dimensions, with pointers to the literature, and sketch open research issues. We conclude with a summary of the state of the art",
keywords = "Structure Discovery and Clustering",
author = "Ira Assent",
year = "2012",
doi = "10.1002/widm.1062",
language = "English",
volume = "2",
pages = "340--350",
journal = "Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery",
issn = "1942-4787",
publisher = "JohnWiley & Sons Ltd.",
number = "4",

}

RIS

TY - JOUR

T1 - Clustering high dimensional data

AU - Assent, Ira

PY - 2012

Y1 - 2012

N2 - High-dimensional data, i.e., data described by a large number of attributes, pose specific challenges to clustering. The so-called ‘curse of dimensionality’, coined originally to describe the general increase in complexity of various computational problems as dimensionality increases, is known to render traditional clustering algorithms ineffective. The curse of dimensionality, among other effects, means that with increasing number of dimensions, a loss of meaningful differentiation between similar and dissimilar objects is observed. As high-dimensional objects appear almost alike, new approaches for clustering are required. Consequently, recent research has focused on developing techniques and clustering algorithms specifically for high-dimensional data. Still, open research issues remain. Clustering is a data mining task devoted to the automatic grouping of data based on mutual similarity. Each cluster groups objects that are similar to one another, whereas dissimilar objects are assigned to different clusters, possibly separating out noise. In this manner, clusters describe the data structure in an unsupervised manner, i.e., without the need for class labels. A number of clustering paradigms exist that provide different cluster models and different algorithmic approaches for cluster detection. Common to all approaches is the fact that they require some underlying assessment of similarity between data objects. In this article, we provide an overview of the effects of high-dimensional spaces, and their implications for different clustering paradigms. We review models and algorithms that address clustering in high dimensions, with pointers to the literature, and sketch open research issues. We conclude with a summary of the state of the art

AB - High-dimensional data, i.e., data described by a large number of attributes, pose specific challenges to clustering. The so-called ‘curse of dimensionality’, coined originally to describe the general increase in complexity of various computational problems as dimensionality increases, is known to render traditional clustering algorithms ineffective. The curse of dimensionality, among other effects, means that with increasing number of dimensions, a loss of meaningful differentiation between similar and dissimilar objects is observed. As high-dimensional objects appear almost alike, new approaches for clustering are required. Consequently, recent research has focused on developing techniques and clustering algorithms specifically for high-dimensional data. Still, open research issues remain. Clustering is a data mining task devoted to the automatic grouping of data based on mutual similarity. Each cluster groups objects that are similar to one another, whereas dissimilar objects are assigned to different clusters, possibly separating out noise. In this manner, clusters describe the data structure in an unsupervised manner, i.e., without the need for class labels. A number of clustering paradigms exist that provide different cluster models and different algorithmic approaches for cluster detection. Common to all approaches is the fact that they require some underlying assessment of similarity between data objects. In this article, we provide an overview of the effects of high-dimensional spaces, and their implications for different clustering paradigms. We review models and algorithms that address clustering in high dimensions, with pointers to the literature, and sketch open research issues. We conclude with a summary of the state of the art

KW - Structure Discovery and Clustering

U2 - 10.1002/widm.1062

DO - 10.1002/widm.1062

M3 - Journal article

VL - 2

SP - 340

EP - 350

JO - Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

JF - Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

SN - 1942-4787

IS - 4

ER -