Utilizing Structure-rich Features to improve Clustering

Author(s)
Benjamin Schelling, Lena Bauer, Sahar Behzadi Soheil, Claudia Plant
Abstract

For successful clustering, an algorithm needs to find the boundaries between clusters. While this is comparatively easy if the clusters are compact and non-overlapping and thus the boundaries clearly defined, features where the clusters blend into each other hinder clustering methods to correctly estimate these boundaries. Therefore, we aim to extract features showing clear cluster boundaries and thus enhance the cluster structure in the data. Our novel technique creates a condensed version of the data set containing the structure important for clustering, but without the noise-information. We demonstrate that this transformation of the data set is much easier to cluster for k-means, but also various other algorithms. Furthermore, we introduce a deterministic initialisation strategy for k-means based on these structure-rich features.

Organisation(s)
Research Network Data Science, Research Group Data Mining and Machine Learning
External organisation(s)
Ludwig-Maximilians-Universität München, Munich Center for Machine Learning (MCML)
Volume
12457
Pages
91-107
No. of pages
17
DOI
https://doi.org/10.1007/978-3-030-67658-2_6
Publication date
2021
Peer reviewed
Yes
Austrian Fields of Science 2012
102033 Data mining
Keywords
ASJC Scopus subject areas
Theoretical Computer Science, Computer Science(all)
Portal url
https://ucris.univie.ac.at/portal/en/publications/utilizing-structurerich-features-to-improve-clustering(49d5251f-4556-4831-be82-a002270bc370).html