Concentration inequalities for leave-one-out cross validation
Avelin, Benny, Viitasaari, Lauri
In this article we prove that estimator stability is enough to show that leave-one-out cross validation is a sound procedure, by providing concentration bounds in a general framework. In particular, we provide concentration bounds beyond Lipschitz continuity assumptions on the loss or on the estimator. We obtain our results by relying on random variables with distribution satisfying the logarithmic Sobolev inequality, providing us a relatively rich class of distributions. We illustrate our method by considering several interesting examples, including linear regression, kernel density estimation, and stabilized/truncated estimators such as stabilized kernel regression. Mathematics Subject Classifications (2020): 62R07, 62G05, 60F15 Keywords: Leave-one-out cross validation, concentration inequalities, logarithmic Sobolev inequality, sub-Gaussian random variables 1. Introduction It is customary in many statistical and machine learning methods to use train-validation, where the data is split into a training set and a validation set. Then the training set is used for estimating (training) the model, while the validation set is used to measure the performance of the fitted model.
Oct-16-2023