Detecting abrupt changes in time-series data has attracted rese archers in the statistics and data mining communities for decades Basseville and Nikiforov ( 1993). Based on the instantaneousness of detection, changepoint detection algorithm s can be classified into two categories: online changepoint detection and offline changepoint de tection. While the online change detection targets on data that requires instantaneous r esponses, the offline detection algorithm often triggers delay, which leads to more accurate result s. This literature review mainly focuses on the online changepoint detection algorithms. There are plenty of changepoint detection algorithms that have be en proposed and proved pragmatic. The pioneering works Basseville and Nikiforov ( 1993) compared the probability distributions of time-series samples over the past and pr esent intervals. The algorithm demonstrates an abrupt change when two distributions a re significantly different.
Detecting the emergence of an abrupt change-point is a classic problem in statistics and machine learning. Kernel-based nonparametric statistics have been proposed for this task which make fewer assumptions on the distributions than traditional parametric approach. However, none of the existing kernel statistics has provided a computationally efficient way to characterize the extremal behavior of the statistic. Such characterization is crucial for setting the detection threshold, to control the significance level in the offline case as well as the average run length in the online case. In this paper we propose two related computationally efficient M-statistics for kernel-based change-point detection when the amount of background data is large.
The change-point detection problem seeks to identify distributional changes at an unknown change-point k* in a stream of data. This problem appears in many important practical settings involving personal data, including biosurveillance, fault detection, finance, signal detection, and security systems. The field of differential privacy offers data analysis tools that provide powerful worst-case privacy guarantees. We study the statistical problem of change-point problem through the lens of differential privacy. We give private algorithms for both online and offline change-point detection, analyze these algorithms theoretically, and then provide empirical validation of these results.
Hypothesis testing plays a central role in statistical inference, and is used in many settings where privacy concerns are paramount. This work answers a basic question about privately testing simple hypotheses: given two distributions $P$ and $Q$, and a privacy level $\varepsilon$, how many i.i.d. samples are needed to distinguish $P$ from $Q$ subject to $\varepsilon$-differential privacy, and what sort of tests have optimal sample complexity? Specifically, we characterize this sample complexity up to constant factors in terms of the structure of $P$ and $Q$ and the privacy level $\varepsilon$, and show that this sample complexity is achieved by a certain randomized and clamped variant of the log-likelihood ratio test. Our result is an analogue of the classical Neyman-Pearson lemma in the setting of private hypothesis testing. We also give an application of our result to the private change-point detection. Our characterization applies more generally to hypothesis tests satisfying essentially any notion of algorithmic stability, which is known to imply strong generalization bounds in adaptive data analysis, and thus our results have applications even when privacy is not a primary concern.
We introduce a kernel-based method for change-point analysis within a sequence of temporal observations. Change-point analysis of an (unlabelled) sample of observations consists in, first, testing whether a change in the distribution occurs within the sample, and second, if a change occurs, estimating the change-point instant after which the distribution of the observations switches from one distribution to another different distribution. We propose a test statistics based upon the maximum kernel Fisher discriminant ratio as a measure of homogeneity between segments. We derive its limiting distribution under the null hypothesis (no change occurs), and establish the consistency under the alternative hypothesis (a change occurs). This allows to build a statistical hypothesis testing procedure for testing the presence of change-point, with a prescribed false-alarm probability and detection probability tending to one in the large-sample setting.