Lindauer, Marius, Feurer, Matthias, Eggensperger, Katharina, Biedenkapp, André, Hutter, Frank

Treating the validation loss of trained machine learning models as a black box function f, we can formulate the hyperparameter optimization problem as: x arg min x X f (x) (1) where X is space of possible configurations x . Although the community is aware of the necessity of hy-perparameter optimization (HPO) for machine learning algorithms, the impact of BO's own hyperparameters is not reported in most BO papers. On top of this, new BO approaches (and implicitly their hyperparameters) are often developed on cheap-to-evaluate artificial functions and then evaluated on real benchmarks. Although we acknowledge that this is a reasonable protocol to prevent over-engineering on the target Contact Author function family (here for example HPO benchmarks of machine learning algorithms), we believe that it is important to study whether this practice is indeed well-founded. We emphasize that this paper considers HPO on two levels as shown in Figure 1: (i) HPO of machine learning algorithms, which we consider as our target function (Target BO) and (ii) optimization of the target-BO's own choices using a meta-optimizer. In particular, we study several research questions related to the meta-optimization problem of BO's hyperparameters: 1. How large is the impact of tuning BO's own hyperpa-rameters if one was allowed to tune these on each function independently? 2. How well does the performance of an optimized configuration of the target-BO generalize to similar new functions from the same family?

Many methods exist for function optimization, such as randomly sampling the variable search space, called random search, or systematically evaluating samples in a grid across the search space, called grid search. More principled methods are able to learn from sampling the space so that future samples are directed toward the parts of the search space that are most likely to contain the extrema. A directed approach to global optimization that uses probability is called Bayesian Optimization. Take my free 7-day email crash course now (with sample code). Click to sign-up and also get a free PDF Ebook version of the course.

Chen, Xi, Wang, Yining, Wang, Yu-Xiang

We consider a non-stationary sequential stochastic optimization problem, in which the underlying cost functions change over time under a variation budget constraint. We propose an $L_{p,q}$-variation functional to quantify the change, which captures local spatial and temporal variations of the sequence of functions. Under the $L_{p,q}$-variation functional constraint, we derive both upper and matching lower regret bounds for smooth and strongly convex function sequences, which generalize previous results in (Besbes et al., 2015). Our results reveal some surprising phenomena under this general variation functional, such as the curse of dimensionality of the function domain. The key technical novelties in our analysis include an affinity lemma that characterizes the distance of the minimizers of two convex functions with bounded $L_p$ difference, and a cubic spline based construction that attains matching lower bounds.

Candelieri, Antonio, Giordani, Ilaria, Perego, Riccardo, Archetti, Francesco

Bayesian Optimization has become the reference method for the global optimization of black box, expensive and possibly noisy functions. Bayesian Op-timization learns a probabilistic model about the objective function, usually a Gaussian Process, and builds, depending on its mean and variance, an acquisition function whose optimizer yields the new evaluation point, leading to update the probabilistic surrogate model. Despite its sample efficiency, Bayesian Optimiza-tion does not scale well with the dimensions of the problem. The optimization of the acquisition function has received less attention because its computational cost is usually considered negligible compared to that of the evaluation of the objec-tive function. Its efficient optimization is often inhibited, particularly in high di-mensional problems, by multiple extrema. In this paper we leverage the addition-ality of the objective function into mapping both the kernel and the acquisition function of the Bayesian Optimization in lower dimensional subspaces. This ap-proach makes more efficient the learning/updating of the probabilistic surrogate model and allows an efficient optimization of the acquisition function. Experi-mental results are presented for real-life application, that is the control of pumps in urban water distribution systems.

Merchán, Eduardo C. Garrido, Pérez, Luis C. Jariego

Bayesian Optimization is the state of the art technique for the optimization of black boxes, i.e., functions where we do not have access to their analytical expression nor its gradients, they are expensive to evaluate and its evaluation is noisy. The most popular application of bayesian optimization is the automatic hyperparameter tuning of machine learning algorithms, where we obtain the best configuration of machine learning algorithms by optimizing the estimation of the generalization error of these algorithms. Despite being applied with success, bayesian optimization methodologies also have hyperparameters that need to be configured such as the probabilistic surrogate model or the acquisition function used. A bad decision over the configuration of these hyperparameters implies obtaining bad quality results. Typically, these hyperparameters are tuned by making assumptions of the objective function that we want to evaluate but there are scenarios where we do not have any prior information about the objective function. In this paper, we propose a first attempt over automatic bayesian optimization by exploring several heuristics that automatically tune the acquisition function of bayesian optimization. We illustrate the effectiveness of these heurisitcs in a set of benchmark problems and a hyperparameter tuning problem of a machine learning algorithm.