Information-Theoretic Non-redundant Subspace Clustering
- Author(s)
- Nina Hubig, Claudia Plant
- Abstract
A comprehensive understanding of complex data requires multiple different views. Subspace clustering methods open up multiple interesting views since they support data objects to be assigned to different clusters in different subspaces. Conventional subspace clustering methods yield many redundant clusters or control redundancy by difficult to set parameters. In this paper, we employ concepts from information theory to naturally trade-off the two major properties of a subspace cluster: The quality of a cluster and its redundancy with respect to the other clusters. Our novel algorithm NORD (for NOn-ReDundant) efficiently discovers the truly relevant clusters in complex data sets without requiring any kind of threshold on their redundancy. NORD also exploits the concept of microclusters to support the detection of arbitrarily-shaped clusters. Our comprehensive experimental evaluation shows the effectiveness and efficiency of NORD on both synthetic and real-world data sets and provides a meaningful visualization of both the quality and the degree of the redundancy of the clustering result on first glance.
- Organisation(s)
- Research Network Data Science, Research Group Data Mining and Machine Learning
- External organisation(s)
- Technische Universität München
- Pages
- 198-209
- No. of pages
- 12
- DOI
- https://doi.org/10.1007/978-3-319-57454-7_16
- Publication date
- 2017
- Peer reviewed
- Yes
- Austrian Fields of Science 2012
- 102015 Information systems
- ASJC Scopus subject areas
- Theoretical Computer Science, General Computer Science
- Portal url
- https://ucrisportal.univie.ac.at/en/publications/03e26e7e-7ab9-4a98-9eeb-e4ddc2e1a2f7