Retrain or not retrain: Conformal test martingales for change-point detection

Vovk, Vladimir, Petej, Ivan, Nouretdinov, Ilia, Ahlberg, Ernst, Carlsson, Lars, Gammerman, Alex

arXiv.org Machine Learning 

The standard assumption in mainstream machine learning is that the observed data are IID (independent and identically distributed); we will refer to it as the IID assumption. Deviations from the IID assumption are known as dataset shift, and different kinds of dataset shift have become a popular topic of research (see, e.g., Quiñonero-Candela et al. (2009)). Testing the IID assumption has been a popular topic in statistics (see, e.g., Lehmann (2006), Chapter 7), but the mainstream work in statistics concentrates on the batch setting with each observation being a real number. In the context of deciding whether a prediction algorithm needs to be retrained, it is more important to process data online, so that at each point in time we have an idea of the degree to which the IID assumption has been discredited. It is also important that the observations are not just real numbers; in the context of machine learning the most important case is where each observation is a pair (x, y) consisting of a sample x (such as an image) and its label y. The existing work on detecting dataset shift in machine learning (see, e.g., Harel et al. (2014) and its literature review) does not have these shortcomings but does not test the IID assumption directly.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found