Goto

Collaborating Authors

 rethinking return-over-investment


Rethinking Return-over-Investment for Machine Learning

#artificialintelligence

It is very common for ML practitioners to split their dataset into three disjoint subsets: training, validation, and test. Multiple instantiations of the model are trained on the training set and evaluated on the validation set, in search of the best combination of hyperparameter values. The combination that yields the highest validation performance is selected, and the final model is judged based on its performance on the test set, often expressed as a single-valued metric like accuracy. This established best practice deliberately closes an eye to the results on the validation set: once a piece of data is used to alter the model (in this case the value of the hyperparameters), it is considered tainted and loses its ability to evaluate how well the model generalizes. Dodge et al. [1] argue that discarding the validation metrics is a missed opportunity to quantify how much computation has gone into finding the right set of hyperparameters.