Concentration inequalities of the cross-validation estimator for Empirical Risk Minimiser
In this article, we derive concentration inequalities for the cross-validation estimate of the generalization error for empirical risk minimizers. In the general setting, we prove sanity-check bounds in the spirit of Kearns et al. (1999) "bounds showing that the worst-case error of this estimate is not much worse that of training error estimate ". General loss functions and class of predictors with finite VC-dimension are considered. We closely follow the formalism introduced by Dudoit et al. (2003) to cover a large variety of cross-validation procedures including leave-oneout cross-validation, k-fold cross-validation, holdout cross-validation (or split sample), and the leave-υ-out cross-validation. In particular, we focus on proving the consistency of the various cross-validation procedures. We point out the interest of each cross-validation procedure in terms of rate of convergence. An estimation curve with transition phases depending on the cross-validation procedure and not only on the percentage of observations in the test sample gives a simple rule on how to choose the cross-validation. An interesting consequence is that the size of the test sample is not required to grow to infinity for the consistency of the cross-validation procedure.
Oct-30-2010
- Country:
- Oceania > Australia
- Australian Capital Territory > Canberra (0.04)
- North America
- United States
- New York (0.04)
- Illinois > Cook County
- Chicago (0.04)
- California > Alameda County
- Berkeley (0.04)
- Canada > Alberta
- United States
- Europe
- France (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Oceania > Australia
- Genre:
- Research Report (1.00)
- Industry:
- Health & Medicine (0.46)
- Technology: