Quantile-based fuzzy C-means clustering of multivariate time series: Robust techniques

López-Oriona, Ángel, D'Urso, Pierpaolo, Vilar, José Antonio, Lafuente-Rego, Borja

arXiv.org Machine Learning 

In particular, time series data have become ubiquitous in our days, arising frequently in a broad variety of fields including medicine, computer science, finance, environmental sciences, machine learning, marketing and neuroscience, among many others. Typically, time series involve a huge number of records, present dynamic behavior patterns which might change over time, and one frequently has to deal with realizations of different length. Due to this complex nature, standard techniques to perform data mining tasks as classification, clustering or anomaly detection often produce unsatisfactory results. Complexity is still greater by treating with high dimensional time series, where the interdependence structure and large dimensionality are serious obstacles to develop efficient procedures. Univariate time series (UTS) were the main focus of intensive research until recently, but multivariate time series (MTS) have received lately a great deal of attention due to the advance of technology and storage capabilities of everyday devices. Well-known examples of MTS are multi-lead ECG signals of patients or records containing several economic indicators of a given country over time, but many other examples can be easily obtained from different fields. Among time series data mining tasks, clustering is a central problem. In fact, identifying groups of similar series is basic for many applications in order to detect a few representative patterns, forecast future performances, quantify affinity, recognize dynamic changes and structural breaks... However, unlike traditional databases, similarity search in time series data is a complex issue that cannot be addressed with conventional methods.