KMN - Removing Noise from K-Means Clustering Results

Author(s): Benjamin Schelling, Claudia Plant
Abstract: K-Means is one of the most important data mining techniques for scientists who want to analyze their data. But K-Means has the disadvantage that it is unable to handle noise data points. This paper proposes a technique that can be applied to the k-means Clustering result to exclude noise data points. We refer to it as KMN (short for K-Means with Noise). This technique is compatible with the different strategies to initialize k-means and determine the number of clusters. Moreover, it is completely parameter-free. The technique has been tested on artificial and real data sets to demonstrate its performance in comparison with other noise-excluding techniques for k-means.
Organisation(s): Research Group Data Mining and Machine Learning, Research Network Data Science
Pages: 137-151
No. of pages: 15
DOI: https://doi.org/10.1007/978-3-319-98539-8_11
Publication date: 09-2018
Peer reviewed: Yes
Austrian Fields of Science 2012: 102033 Data mining
ASJC Scopus subject areas: Theoretical Computer Science, General Computer Science
Portal url: https://ucrisportal.univie.ac.at/en/publications/b991dc27-f8a3-4e6a-8068-7f7c2eed2f58