Goto

Collaborating Authors

 Gammerman, Alex


Validity and efficiency of the conformal CUSUM procedure

arXiv.org Artificial Intelligence

In this paper we study the validity and efficiency of a conformal version of the CUSUM procedure for change detection both experimentally and theoretically.


Retrain or not retrain: Conformal test martingales for change-point detection

arXiv.org Machine Learning

The standard assumption in mainstream machine learning is that the observed data are IID (independent and identically distributed); we will refer to it as the IID assumption. Deviations from the IID assumption are known as dataset shift, and different kinds of dataset shift have become a popular topic of research (see, e.g., Quiรฑonero-Candela et al. (2009)). Testing the IID assumption has been a popular topic in statistics (see, e.g., Lehmann (2006), Chapter 7), but the mainstream work in statistics concentrates on the batch setting with each observation being a real number. In the context of deciding whether a prediction algorithm needs to be retrained, it is more important to process data online, so that at each point in time we have an idea of the degree to which the IID assumption has been discredited. It is also important that the observations are not just real numbers; in the context of machine learning the most important case is where each observation is a pair (x, y) consisting of a sample x (such as an image) and its label y. The existing work on detecting dataset shift in machine learning (see, e.g., Harel et al. (2014) and its literature review) does not have these shortcomings but does not test the IID assumption directly.


Conformal calibrators

arXiv.org Machine Learning

Conformal predictive distributions were inspired by the work on predictive distributions inparametric statistics (see, e.g., [7, Chapter 12] and [8]) and first suggested in [14]. As usual, we will refer to algorithms producing conformal predictive distributions as conformal predictive systems (CPS, used in both singular andplural senses). Conformal predictive systems are built on top of traditional prediction algorithms toensure a property of validity usually referred to as calibration in probability [3]. Several versions of the Least Squares Prediction Machine, CPS based on the method of Least Squares, are constructed in [14]. This construction isslightly extended to cover ridge regression and then further extended to nonlinear settings by applying the kernel trick in [12]. However, even after this extension the method is not fully adaptive, even for a universal kernel. As explained in [12, Section 7], the universality of the kernel shows in the ability of the predictive distribution function to take any shape; however, the CPS is still inflexible in that the shape does not depend, or depends weakly, on the test object. Formany base algorithms full CPS (like full conformal predictors in general) are computationally inefficient, and [13] define and study computationally efficient versionsof CPS, namely split-conformal predictive systems (SCPS) and 1 cross-conformal predictive systems (CCPS).


Conformal predictive distributions with kernels

arXiv.org Machine Learning

This paper reviews the checkered history of predictive distributions in statistics and discusses two developments, one from recent literature and the other new. The first development is bringing predictive distributions into machine learning, whose early development was so deeply influenced by two remarkable groups at the Institute of Automation and Remote Control. The second development is combining predictive distributions with kernel methods, which were originated by one of those groups, including Emmanuel Braverman.


Learning by Transduction

arXiv.org Machine Learning

We describe a method for predicting a classification of an object given classifications of the objects in the training set, assuming that the pairs object/classification are generated by an i.i.d. process from a continuous probability distribution. Our method is a modification of Vapnik's support-vector machine; its main novelty is that it gives not only the prediction itself but also a practicable measure of the evidence found in support of that prediction. We also describe a procedure for assigning degrees of confidence to predictions made by the support vector machine. Some experimental results are presented, and possible extensions of the algorithms are discussed.


On-line Prediction with Kernels and the Complexity Approximation Principle

arXiv.org Machine Learning

The paper describes an application of Aggregating Algorithm to the problem of regression. It generalizes earlier results concerned with plain linear regression to kernel techniques and presents an on-line algorithm which performs nearly as well as any oblivious kernel predictor. The paper contains the derivation of an estimate on the performance of this algorithm. The estimate is then used to derive an application of the Complexity Approximation Principle to kernel methods.


Online prediction of ovarian cancer

arXiv.org Artificial Intelligence

In this paper we apply computer learning methods to diagnosing ovarian cancer using the level of the standard biomarker CA125 in conjunction with information provided by mass-spectrometry. We are working with a new data set collected over a period of 7 years. Using the level of CA125 and mass-spectrometry peaks, our algorithm gives probability predictions for the disease. To estimate classification accuracy we convert probability predictions into strict predictions. Our algorithm makes fewer errors than almost any linear combination of the CA125 level and one peak's intensity (taken on the log scale). To check the power of our algorithm we use it to test the hypothesis that CA125 and the peaks do not contain useful information for the prediction of the disease at a particular time before the diagnosis. Our algorithm produces $p$-values that are better than those produced by the algorithm that has been previously applied to this data set. Our conclusion is that the proposed algorithm is more reliable for prediction on new data.