Non-Redundant Subspace Clusterings with Nr-Kmeans and Nr-DipMeans

Author(s)
Dominik Mautz, Wei Ye, Claudia Plant, Christian Böhm
Abstract

A huge object collection in high-dimensional space can often be clustered in more than one way, for instance, objects could be clustered by their shape or alternatively by their color. Each grouping represents a different view of the dataset. The new research field of non-redundant clustering addresses this class of problems. In this article, we follow the approach that different, non-redundant k-means-like clusterings may exist in different, arbitrarily oriented subspaces of the high-dimensional space. We assume that these subspaces (and optionally a further noise space without any cluster structure) are orthogonal to each other. This assumption enables a particularly rigorous mathematical treatment of the non-redundant clustering problem and thus a particularly efficient algorithm, which we call Nr-Kmeans (for non-redundant k-means). The superiority of our algorithm is demonstrated both theoretically, as well as in extensive experiments. Further, we propose an extension of Nr-Kmeans that harnesses Hartigan’s dip test to identify the number of clusters for each subspace automatically.

Organisation(s)
Research Network Data Science, Research Group Data Mining and Machine Learning
External organisation(s)
Ludwig-Maximilians-Universität München, University of California, Los Angeles
Journal
ACM Transactions on Knowledge Discovery from Data
Volume
14
Pages
1-24
ISSN
1556-4681
DOI
https://doi.org/10.1145/3385652
Publication date
08-2020
Peer reviewed
Yes
Austrian Fields of Science 2012
102033 Data mining
Portal url
https://ucris.univie.ac.at/portal/en/publications/nonredundant-subspace-clusterings-with-nrkmeans-and-nrdipmeans(db10254b-3d36-45c6-a5b9-12f072cf6fbc).html