A U-statistic estimator for the variance of resampling-based error estimators

Fuchs, Mathias, Hornung, Roman, De Bin, Riccardo, Boulesteix, Anne-Laure

arXiv.org Machine Learning 

The goal of supervised statistical learning is to develop prediction rules taking the values of predictor variables as input and returning a predicted value of the response variable. A prediction rule is typically learnt by applying a learning algorithm M to a so-called learning data set. A typical example in biomedical research is the prediction of patient outcome (e.g. The practitioners are usually interested in the accuracy of the prediction rule learnt from their data set to predict future patients, while methodological researchers rather want to know whether the learning algorithm is good at learning accurate prediction rules for different data sets drawn from a distribution of interest. The first perspective is called "conditional" (since referring to a specific data set) while the latter, which we take in this paper, is denoted as "unconditional". If the data set is very large, one can observe independent realizations of estimators of the unconditional error rates and use them for a paired t-test (see Section 2.3). In practise, however, huge data sets are rarely available. Prediction errors are thus usually estimated by resampling procedures consisting of splitting the available data set into learning and test sets a large number of times and averaging the estimated error over these iterations.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found