Goto

Collaborating Authors

 randalo


RandALO: Out-of-sample risk estimation in no time flat

arXiv.org Machine Learning

Training machine learning models is an often expensive process, especially in large data settings. Not only is there significant cost in the fitting of individual models, but even more importantly, the best model must be chosen from a set of candidates parameterized by a set of "hyperparameters" indexing the models, and each of these models must be fitted and evaluated in order to make the optimal selection. As a result, model selection, also called hyperparameter tuning, tends to be the most computationally expensive part of the machine learning pipeline. In order to evaluate models, we typically need to set aside unseen "holdout" data to estimate the risk of the model on new samples from the training distribution. When we have an abundance of training samples, such as in the millions or billions, we can afford to set aside a modest holdout set of tens of thousands of examples without compromising model performance.