validation data
An Empirical Bayes Approach to Optimizing Machine Learning Algorithms
There is rapidly growing interest in using Bayesian optimization to tune model and inference hyperparameters for machine learning algorithms that take a long time to run. For example, Spearmint is a popular software package for selecting the optimal number of layers and learning rate in neural networks. But given that there is uncertainty about which hyperparameters give the best predictive performance, and given that fitting a model for each choice of hyperparameters is costly, it is arguably wasteful to throw away all but the best result, as per Bayesian optimization. A related issue is the danger of overfitting the validation data when optimizing many hyperparameters. In this paper, we consider an alternative approach that uses more samples from the hyperparameter selection procedure to average over the uncertainty in model hyperparameters. The resulting approach, empirical Bayes for hyperparameter averaging (EB-Hyp) predicts held-out data better than Bayesian optimization in two experiments on latent Dirichlet allocation and deep latent Gaussian models. EB-Hyp suggests a simpler approach to evaluating and deploying machine learning algorithms that does not require a separate validation data set and hyperparameter selection procedure.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > France (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Law (1.00)
- Information Technology > Security & Privacy (0.92)
- Health & Medicine > Diagnostic Medicine > Imaging (0.46)
- Government > Regional Government > North America Government > United States Government (0.46)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
- North America > United States (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
Supplementary Material
The supplementary material is organized as follows. We give details of the definitions and notation in Section B.1 . Then, we provide the technical details of the lower bound (Lemma 3.3). In Section D.4 we provide insights into auto-labeling using This suggests, in these settings auto-labeling using active learning followed by selective classification is expected to work well. This idea is captured by the Chow's excess risk [ Nevertheless, it would be interesting future work to explore the connections between auto-labeling and active learning with abstention.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- North America > United States > Pennsylvania (0.04)
- (5 more...)
- Workflow (0.46)
- Research Report > New Finding (0.46)
- South America > French Guiana (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- Europe > France > Occitanie > Hérault > Montpellier (0.04)
- (3 more...)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > China > Beijing > Beijing (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (3 more...)
- North America > United States (0.68)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)