A Distance Metric for Sets of Events

Raphael Sahann, Claudia Plant, Torsten Möller

In this work, we introduce a novel distance metric that describes the distance between sets of events, where events in the most common form are actions that happen at a given time. More generally, an event can be any object that is in an ordered relation to other objects. In our case, an event is a course taken by a student that happens during a specific semester. Calculating the distance uses the difference between the positional relations of all individual events in the set. For this, we do not use the absolute position of events but instead use the sum of differences of the relations before, concurrent, and after to express distance. We describe our metric algorithmically and evaluate it formally as well as exemplary on an existing data set of student exams. We also show that the results of the metric are intuitive to interpret for humans by comparing them to the results of a user study that we ran.This metric can be applied to a range of problems that rely on the positional relation of events by removing the dependency of timestamps for events and replacing them with a set of ordered identifiers. We show a specific application of the metric by tackling the problem of clustering and predicting study paths from university students.

Research Network Data Science, Research Group Visualization and Data Analysis, Coordination of Student Services, Research Group Data Mining and Machine Learning
No. of pages
Publication date
Peer reviewed
Austrian Fields of Science 2012
102033 Data mining
Portal url