Collaborating Authors

Tong, Anh

Confirmatory Bayesian Online Change Point Detection in the Covariance Structure of Gaussian Processes Machine Learning

In the analysis of sequential data, the detection of abrupt changes is important in predicting future changes. In this paper, we propose statistical hypothesis tests for detecting covariance structure changes in locally smooth time series modeled by Gaussian Processes (GPs). We provide theoretically justified thresholds for the tests, and use them to improve Bayesian Online Change Point Detection (BOCPD) by confirming statistically significant changes and non-changes. Our Confirmatory BOCPD (CBOCPD) algorithm finds multiple structural breaks in GPs even when hyperparameters are not tuned precisely. We also provide conditions under which CBOCPD provides the lower prediction error compared to BOCPD. Experimental results on synthetic and real-world datasets show that our new tests correctly detect changes in the covariance structure in GPs. The proposed algorithm also outperforms existing methods for the prediction of nonstationarity in terms of both regression error and log likelihood.

Discovering Explainable Latent Covariance Structure for Multiple Time Series Machine Learning

Analyzing time series data is important to predict future events and changes in finance, manufacturing, and administrative decisions. Gaussian processes (GPs) solve regression and classification problems by choosing appropriate kernels capturing covariance structure of data. In time series analysis, GP based regression methods recently demonstrate competitive performance by decomposing temporal covariance structure. Such covariance structure decomposition allows exploiting shared parameters over a set of multiple but selected time series. In this paper, we handle multiple time series by placing an Indian Buffet Process (IBP) prior on the presence of shared kernels. We investigate the validity of model when infinite latent components are introduced. We also propose an improved search algorithm to find interpretable kernels among multiple time series along with comparison reports. Experiments are conducted on both synthetic data sets and real world data sets, showing promising results in term of structure discoveries and predictive performances.

Automatic Generation of Probabilistic Programming from Time Series Data Machine Learning

Probabilistic programming languages represent complex data with intermingled models in a few lines of code. Efficient inference algorithms in probabilistic programming languages make possible to build unified frameworks to compute interesting probabilities of various large, real-world problems. When the structure of model is given, constructing a probabilistic program is rather straightforward. Thus, main focus have been to learn the best model parameters and compute marginal probabilities. In this paper, we provide a new perspective to build expressive probabilistic program from continue time series data when the structure of model is not given. The intuition behind of our method is to find a descriptive covariance structure of time series data in nonparametric Gaussian process regression. We report that such descriptive covariance structure efficiently derives a probabilistic programming description accurately.

The Automatic Statistician: A Relational Perspective Machine Learning

Gaussian Processes (GPs) provide a general and analytically tractable way of modeling complex time-varying, nonparametric functions. The Automatic Bayesian Covariance Discovery (ABCD) system constructs natural-language description of time-series data by treating unknown time-series data nonparametrically using GP with a composite covariance kernel function. Unfortunately, learning a composite covariance kernel with a single time-series data set often results in less informative kernel that may not give qualitative, distinctive descriptions of data. We address this challenge by proposing two relational kernel learning methods which can model multiple time-series data sets by finding common, shared causes of changes. We show that the relational kernel learning methods find more accurate models for regression problems on several real-world data sets; US stock data, US house price index data and currency exchange rate data.