spearmint
$\pi$BO: Augmenting Acquisition Functions with User Beliefs for Bayesian Optimization
Hvarfner, Carl, Stoll, Danny, Souza, Artur, Lindauer, Marius, Hutter, Frank, Nardi, Luigi
Bayesian optimization (BO) has become an established framework and popular tool for hyperparameter optimization (HPO) of machine learning (ML) algorithms. While known for its sample-efficiency, vanilla BO can not utilize readily available prior beliefs the practitioner has on the potential location of the optimum. To address this issue, we propose πBO, an acquisition function generalization which incorporates prior beliefs about the location of the optimum in the form of a probability distribution, provided by the user. In contrast to previous approaches, πBO is conceptually simple and can easily be integrated with existing libraries and many acquisition functions. We provide regret bounds when πBO is applied to the common Expected Improvement acquisition function and prove convergence at regular rates independently of the prior. Further, our experiments show that πBO outperforms competing approaches across a wide suite of benchmarks and prior characteristics. We also demonstrate that πBO improves on the state-of-theart performance for a popular deep learning task, with a 12.5 time-to-accuracy speedup over prominent BO approaches. The optimization of expensive black-box functions is a prominent task, arising across a wide range of applications. Despite the demonstrated effectiveness of BO for HPO (Bergstra et al., 2011; Turner et al., 2021), its adoption among practitioners remains limited. In a survey covering NeurIPS 2019 and ICLR 2020 (Bouthillier & Varoquaux, 2020), manual search was shown to be the most prevalent tuning method, with BO accounting for less than 7% of all tuning efforts. As the understanding of hyperparameter settings in deep learning (DL) models increase (Smith, 2018), so too does the tuning proficiency of practitioners (Anand et al., 2020). As previously displayed (Smith, 2018; Anand et al., 2020; Souza et al., 2021; Wang et al., 2019), this knowledge manifests in choosing single configurations or regions of hyperparameters that presumably yield good results, demonstrating a belief over the location of the optimum. BO's deficit to properly incorporate said beliefs is a reason why practitioners prefer manual search to BO (Wang et al., 2019), despite its documented shortcomings (Bergstra & Bengio, 2012). To improve the usefulness of automated HPO approaches for ML practictioners, the ability to incorporate such knowledge is pivotal. Well-established BO frameworks (Snoek et al., 2012; Hutter et al., 2011; The GPyOpt authors, 2016; Kandasamy et al., 2020; Balandat et al., 2020) support user input to a limited extent, such as by biasing the initial design, or by narrowing the search space; however, this type of hard prior can lead to poor performance by missing important regions.
- Europe > Germany > Baden-Württemberg > Freiburg (0.04)
- South America > Brazil > Minas Gerais (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
Feature Engineering and Forecasting via Integration of Derivative-free Optimization and Ensemble of Sequence-to-sequence Networks: Renewable Energy Case Studies
Pirhooshyaran, Mohammad, Snyder, Lawrence V., Scheinberg, Katya
This research introduces a framework for forecasting, reconstruction and feature engineering of multivariate processes. We integrate derivative-free optimization with ensemble of sequence-to-sequence networks. We design a new resampling technique called additive which along with Bootstrap aggregating (bagging) resampling are applied to initialize the ensemble structure. We explore the proposed framework performance on three renewable energy sources wind, solar and ocean wave. We conduct several short- to long-term forecasts showing the superiority of the proposed method compare to numerous machine learning techniques. The findings indicate that the introduced method performs reasonably better when the forecasting horizon becomes longer. In addition, we modify the framework for automated feature selection. The model represents a clear interpretation of the selected features. We investigate the effects of different environmental and marine factors on the wind speed and ocean output power respectively and report the selected features. Moreover, we explore the online forecasting setting and illustrate that the model exceeds alternatives through different measurement errors.
- North America > United States (1.00)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (4 more...)
- Energy > Renewable > Wind (1.00)
- Energy > Renewable > Solar (1.00)
- Government > Regional Government > North America Government > United States Government (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Hyperparameter Optimization: A Spectral Approach
Hazan, Elad, Klivans, Adam, Yuan, Yang
We give a simple, fast algorithm for hyperparameter optimization inspired by techniques from the analysis of Boolean functions. We focus on the high-dimensional regime where the canonical example is training a neural network with a large number of hyperparameters. The algorithm --- an iterative application of compressed sensing techniques for orthogonal polynomials --- requires only uniform sampling of the hyperparameters and is thus easily parallelizable. Experiments for training deep neural networks on Cifar-10 show that compared to state-of-the-art tools (e.g., Hyperband and Spearmint), our algorithm finds significantly improved solutions, in some cases better than what is attainable by hand-tuning. In terms of overall running time (i.e., time required to sample various settings of hyperparameters plus additional computation time), we are at least an order of magnitude faster than Hyperband and Bayesian Optimization. We also outperform Random Search 8x. Additionally, our method comes with provable guarantees and yields the first improvements on the sample complexity of learning decision trees in over two decades. In particular, we obtain the first quasi-polynomial time algorithm for learning noisy decision trees with polynomial sample complexity.
- Asia > China > Beijing > Beijing (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (13 more...)
Learning to Learn without Gradient Descent by Gradient Descent
Chen, Yutian, Hoffman, Matthew W., Colmenarejo, Sergio Gomez, Denil, Misha, Lillicrap, Timothy P., Botvinick, Matt, de Freitas, Nando
We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad range of derivative-free black-box functions, including Gaussian process bandits, simple control objectives, global optimization benchmarks and hyper-parameter tuning tasks. Up to the training horizon, the learned optimizers learn to trade-off exploration and exploitation, and compare favourably with heavily engineered Bayesian optimization packages for hyper-parameter tuning.
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > Canada > British Columbia (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Initializing Bayesian Hyperparameter Optimization via Meta-Learning
Feurer, Matthias (University of Freiburg) | Springenberg, Jost Tobias (University of Freiburg) | Hutter, Frank (University of Freiburg)
Model selection and hyperparameter optimization is crucial in applying machine learning to a novel dataset. Recently, a subcommunity of machine learning has focused on solving this problem with Sequential Model-based Bayesian Optimization (SMBO), demonstrating substantial successes in many applications. However, for computationally expensive algorithms the overhead of hyperparameter optimization can still be prohibitive. In this paper we mimic a strategy human domain experts use: speed up optimization by starting from promising configurations that performed well on similar datasets. The resulting initialization technique integrates naturally into the generic SMBO framework and can be trivially applied to any SMBO method. To validate our approach, we perform extensive experiments with two established SMBO frameworks (Spearmint and SMAC) with complementary strengths; optimizing two machine learning frameworks on 57 datasets. Our initialization procedure yields mild improvements for low-dimensional hyperparameter optimization and substantially improves the state of the art for the more complex combined algorithm selection and hyperparameter optimization problem.
- Europe > Germany > Baden-Württemberg > Freiburg (0.04)
- Europe > Switzerland > Geneva > Geneva (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)