Synchronization-based clustering on evolving data stream

Author(s)
Junming Shao, Yue Tan, Lianli Gao, Qinli Yang, Claudia Plant, Ira Assent
Abstract

Clustering streams of data is of increasing importance in many applications. In this paper, we propose a new synchronization-based clustering approach for evolving data streams, called SyncTree, which maintains all micro-clusters at different levels of granularity depending upon the data recency. Instead of using a sliding window or decay function to focus on recent data, SyncTree summarizes all continuously-arriving objects as synchronized micro-clusters sequentially in a batch fashion. Owing to the powerful concept of synchronization, the derived micro-clusters truly reflect the intrinsic cluster structure rather than summarize statistics of data, and old micro-clusters can be intuitively summarized at a higher level by iterative clustering to fit memory constraints. Building upon the hierarchical micro-clusters, SyncTree allows investigating the cluster structure of the data stream between any two time stamps in the past, and also provides a principled way to analyze the cluster evolution. Empirical results demonstrate that our method has good performance compared to state-of-the-art algorithms.

Organisation(s)
Research Group Data Mining and Machine Learning
External organisation(s)
University of Electronic Science and Technology of China, Aarhus University
Journal
Information Sciences
Volume
501
Pages
573-587
No. of pages
15
ISSN
0020-0255
DOI
https://doi.org/10.1016/j.ins.2018.09.035
Publication date
09-2018
Peer reviewed
Yes
Austrian Fields of Science 2012
102033 Data mining
Keywords
ASJC Scopus subject areas
Software, Information Systems and Management, Artificial Intelligence, Theoretical Computer Science, Control and Systems Engineering, Computer Science Applications
Portal url
https://ucris.univie.ac.at/portal/en/publications/synchronizationbased-clustering-on-evolving-data-stream(0877e080-d7c0-4bfa-a3ba-3bca31584d3f).html