Data Smashing 2.0: Sequence Likelihood (SL) Divergence For Fast Time Series Comparison

Huang, Yi, Chattopadhyay, Ishanu

arXiv.org Machine Learning 

Abstract--Recognizing subtle historical patterns is central to modeling and forecasting problems in time series analysis. Here we introduce and develop a new approach to quantify deviations in the underlying hidden generators of observed data streams, resulting in a new efficiently computable universal metric for time series. The proposed metric is universal in the sense that we can compare and contrast data streams regardless of where and how they are generated, and without any feature engineering step. The approach proposed in this paper is conceptually distinct from our previous work on data smashing [4], and vastly improves discrimination performance and computing speed. The core idea here is the generalization of the notion of KL divergence often used to compare probability distributions to a notion of divergence in time series. We call this the sequence likelihood (SL) divergence, which may be used to measure deviations within a well-defined class of discrete-valued stochastic processes. We devise efficient estimators of SL divergence from finite sample paths, and subsequently formulate a universal metric useful for computing distance between time series produced by hidden stochastic generators. We illustrate the superior performance of the new smash2.0 Pattern disambiguation in two distinct applications involving electroencephalogram data and gait recognition is also illustrated. We are hopeful that the smash2.0 Effi ciently learning stochastic processes is a key challenge in analyzing time-dependency in domains where randomness cannot be ignored. For such learning to occur, we need todefine a distance metric to compare and contrast time series. However these distance metrics mentioned all have either or both of the following limitations: first, dimensionality reduction and feature selection heavily relies on domain knowledge and inevitably incur trade-o ff between precision and computability. Secondly, when dealing with data from nontrivial stochastic process dynamics, state of the art techniques might fail to correctly estimate the similarity or lack thereof between exemplars.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found