Clustering piecewise stationary processes
Khaleghi, Azadeh, Ryabko, Daniil
Clustering, in an informal sense, involves breaking a dataset into possibly disjoint subsets called clusters where the elements within the same cluster are somehow more similar to each other than to those in other clusters. This task, which is often a first step in data-analysis, is meant to help with the initial steps to making sense of the data that typically have complex structures and represent some unknown underlying phenomena to be inferred. Given the nature of the problem, it is desirable to make as little assumptions as possible about the underlying mechanisms generating the data. Moreover, the minimal assumptions made should ideally be qualitative and easily verifiable from an application's perspective. In this paper we consider a subclass of the clustering problem where each data-point is a time series. Indeed, such sequential data are ubiquitous in modern applications involving, for example, user-behaviour, social networks, as well as financial or biological data, where the observations are sequential by nature, and/or are collected over time. The common features in these real-world datasets are the absence of precise models as well as an abundance of data. From a mathematical perspective, a learning problem involving sequential data can be formulated as follows. Given sequences of the form Y,...,Y the aim is to make inference
Jun-26-2019