Efficient penalty search for multiple changepoint problems

arXiv.org Machine Learning

In the multiple changepoint setting, various search methods have been proposed which involve optimising either a constrained or penalised cost function over possible numbers and locations of changepoints using dynamic programming. Such methods are typically computationally intensive. Recent work in the penalised optimisation setting has focussed on developing a pruning-based approach which gives an improved computational cost that, under certain conditions, is linear in the number of data points. Such an approach naturally requires the specification of a penalty to avoid under/over-fitting. Work has been undertaken to identify the appropriate penalty choice for data generating processes with known distributional form, but in many applications the model assumed for the data is not correct and these penalty choices are not always appropriate. Consequently it is desirable to have an approach that enables us to compare segmentations for different choices of penalty. To this end we present a method to obtain optimal changepoint segmentations of data sequences for all penalty values across a continuous range. This permits an evaluation of the various segmentations to identify a suitably parsimonious penalty choice. The computational complexity of this approach can be linear in the number of data points and linear in the difference between the number of changepoints in the optimal segmentations for the smallest and largest penalty values. This can be orders of magnitude faster than alternative approaches that find optimal segmentations for a range of the number of changepoints.

Detecting changes in slope with an $L_0$ penalty

arXiv.org Machine Learning

Whilst there are many approaches to detecting changes in mean for a univariate time-series, the problem of detecting multiple changes in slope has comparatively been ignored. Part of the reason for this is that detecting changes in slope is much more challenging. For example, simple binary segmentation procedures do not work for this problem, whilst efficient dynamic programming methods that work well for the change in mean problem cannot be directly used for detecting changes in slope. We present a novel dynamic programming approach, CPOP, for finding the "best" continuous piecewise-linear fit to data. We define best based on a criterion that measures fit to data using the residual sum of squares, but penalises complexity based on an $L_0$ penalty on changes in slope. We show that using such a criterion is more reliable at estimating changepoint locations than approaches that penalise complexity using an $L_1$ penalty. Empirically CPOP has good computational properties, and can analyse a time-series with over 10,000 observations and over 100 changes in a few minutes. Our method is used to analyse data on the motion of bacteria, and provides fits to the data that both have substantially smaller residual sum of squares and are more parsimonious than two competing approaches.

Lagged Exact Bayesian Online Changepoint Detection

arXiv.org Machine Learning

Identifying changes in the generative process of sequential data, known as changepoint detection, has become an increasingly important topic for a wide variety of fields. A recently developed approach, which we call EXact Online Bayesian Changepoint Detection (EXO), has shown reasonable results with efficient computation for real time updates. However, when the changes are relatively small, EXO starts to have difficulty in detecting changepoints accurately. We propose a new algorithm called $\ell$-Lag EXact Online Bayesian Changepoint Detection (LEXO-$\ell$), which improves the accuracy of the detection by incorporating $\ell$ time lags in the inference. We prove that LEXO-1 finds the exact posterior distribution for the current run length and can be computed efficiently, with extension to arbitrary lag. Additionally, we show that LEXO-1 performs better than EXO in an extensive simulation study; this study is extended to higher order lags to illustrate the performance of the generalized methodology. Lastly, we illustrate applicability with two real world data examples comparing EXO and LEXO-1.

Catching Change-points with Lasso

Neural Information Processing Systems

We propose a new approach for dealing with the estimation of the location of change-points in one-dimensional piecewise constant signals observed in white noise. Our approach consists in reframing this task in a variable selection context. We use a penalized least-squares criterion with a l1-type penalty for this purpose. We prove that, in an appropriate asymptotic framework, this method provides consistent estimators of the change-points. Then, we explain how to implement this method in practice by combining the LAR algorithm and a reduced version of the dynamic programming algorithm and we apply it to synthetic and real data.

Modelling Airway Geometry as Stock Market Data using Bayesian Changepoint Detection

arXiv.org Machine Learning

Numerous lung diseases, such as idiopathic pulmonary fibrosis (IPF), exhibit dilation of the airways. Accurate measurement of dilatation enables assessment of the progression of disease. Unfortunately the combination of image noise and airway bifurcations causes high variability in the profiles of cross-sectional areas, rendering the identification of affected regions very difficult. Here we introduce a noise-robust method for automatically detecting the location of progressive airway dilatation given two profiles of the same airway acquired at different time points. We propose a probabilistic model of abrupt relative variations between profiles and perform inference via Reversible Jump Markov Chain Monte Carlo sampling. We demonstrate the efficacy of the proposed method on two datasets; (i) images of healthy airways with simulated dilatation; (ii) pairs of real images of IPF-affected airways acquired at 1 year intervals. Our model is able to detect the starting location of airway dilatation with an accuracy of 2.5mm on simulated data. The experiments on the IPF dataset display reasonable agreement with radiologists. We can compute a relative change in airway volume that may be useful for quantifying IPF disease progression.