Non-Redundant Subspace Clusterings with Nr-Kmeans and Nr-DipMeans
- Author(s)
- Dominik Mautz, Wei Ye, Claudia Plant, Christian Böhm
- Abstract
A huge object collection in high-dimensional space can often be clustered in more than one way, for instance, objects could be clustered by their shape or alternatively by their color. Each grouping represents a different view of the dataset. The new research field of non-redundant clustering addresses this class of problems. In this article, we follow the approach that different, non-redundant k-means-like clusterings may exist in different, arbitrarily oriented subspaces of the high-dimensional space. We assume that these subspaces (and optionally a further noise space without any cluster structure) are orthogonal to each other. This assumption enables a particularly rigorous mathematical treatment of the non-redundant clustering problem and thus a particularly efficient algorithm, which we call Nr-Kmeans (for non-redundant k-means). The superiority of our algorithm is demonstrated both theoretically, as well as in extensive experiments. Further, we propose an extension of Nr-Kmeans that harnesses Hartigan’s dip test to identify the number of clusters for each subspace automatically.
- Organisation(s)
- Research Network Data Science, Research Group Data Mining and Machine Learning
- External organisation(s)
- Ludwig-Maximilians-Universität München, University of California, Los Angeles
- Journal
- ACM Transactions on Knowledge Discovery from Data
- Volume
- 14
- Pages
- 1-24
- No. of pages
- 24
- ISSN
- 1556-4681
- DOI
- https://doi.org/10.1145/3385652
- Publication date
- 08-2020
- Peer reviewed
- Yes
- Austrian Fields of Science 2012
- 102033 Data mining
- Keywords
- ASJC Scopus subject areas
- Computer Science(all)
- Portal url
- https://ucris.univie.ac.at/portal/en/publications/nonredundant-subspace-clusterings-with-nrkmeans-and-nrdipmeans(db10254b-3d36-45c6-a5b9-12f072cf6fbc).html