Goto

Collaborating Authors

 Lee, Eric Hans


SigOpt Mulch: An Intelligent System for AutoML of Gradient Boosted Trees

arXiv.org Artificial Intelligence

Gradient boosted trees (GBTs) are ubiquitous models used by researchers, machine learning (ML) practitioners, and data scientists because of their robust performance, interpretable behavior, and ease-of-use. One critical challenge in training GBTs is the tuning of their hyperparameters. In practice, selecting these hyperparameters is often done manually. Recently, the ML community has advocated for tuning hyperparameters through black-box optimization and developed state-of-the-art systems to do so. However, applying such systems to tune GBTs suffers from two drawbacks. First, these systems are not \textit{model-aware}, rather they are designed to apply to a \textit{generic} model; this leaves significant optimization performance on the table. Second, using these systems requires \textit{domain knowledge} such as the choice of hyperparameter search space, which is an antithesis to the automatic experimentation that black-box optimization aims to provide. In this paper, we present SigOpt Mulch, a model-aware hyperparameter tuning system specifically designed for automated tuning of GBTs that provides two improvements over existing systems. First, Mulch leverages powerful techniques in metalearning and multifidelity optimization to perform model-aware hyperparameter optimization. Second, it automates the process of learning performant hyperparameters by making intelligent decisions about the optimization search space, thus reducing the need for user domain knowledge. These innovations allow Mulch to identify good GBT hyperparameters far more efficiently -- and in a more seamless and user-friendly way -- than existing black-box hyperparameter tuning systems.


Achieving Diversity in Objective Space for Sample-efficient Search of Multiobjective Optimization Problems

arXiv.org Artificial Intelligence

As mathematical, statistical, and machine learning algorithms leverage increasingly powerful computational hardware to perform elaborate tasks, simulation has grown to play a key role in fields such as materials science, operations research, industrial engineering, aerodynamics, pharmaceuticals, image processing, and many others. In particular, a key use of these simulations is to serve as a surrogate for the eventual implementation and/or manufacturing during the design optimization; running a computational simulation is likely much cheaper than actually conducting a physical experiment or fabrication (Forrester et al. 2008; Negoescu et al. 2011; Molesky et al. 2018; Haghanifar et al. 2020). Computational simulations can, however, easily run for hours or days, making simulation itself an often costly proposition. The high cost of a single simulation is compounded by the frequent need to simulate many different systems to search for a set of desirable outcomes. This is the motivating force behind simulation optimization, which seeks to identify suitable system parameters to achieve a satisfactory system or effective simulation in a sample-efficient fashion, i.e., with as few simulations conducted as possible. In practical situations, simulations almost always have multiple competing objectives which define success, and thus it is important for users to understand trade-offs between these competing objectives in order to make an informed design decision. Multiobjective optimization tackles this problem by identifying the Pareto frontier, which is the manifold in objective space such that improving one objective cannot occur without harming another. Unfortunately, using the Pareto frontier as the measurement of success may be limiting in engineering and design applications.


A Nonmyopic Approach to Cost-Constrained Bayesian Optimization

arXiv.org Machine Learning

Bayesian optimization (BO) is a popular method for optimizing expensive-to-evaluate black-box functions. BO budgets are typically given in iterations, which implicitly assumes each evaluation has the same cost. In fact, in many BO applications, evaluation costs vary significantly in different regions of the search space. In hyperparameter optimization, the time spent on neural network training increases with layer size; in clinical trials, the monetary cost of drug compounds vary; and in optimal control, control actions have differing complexities. Cost-constrained BO measures convergence with alternative cost metrics such as time, money, or energy, for which the sample efficiency of standard BO methods is ill-suited. For cost-constrained BO, cost efficiency is far more important than sample efficiency. In this paper, we formulate cost-constrained BO as a constrained Markov decision process (CMDP), and develop an efficient rollout approximation to the optimal CMDP policy that takes both the cost and future iterations into account. We validate our method on a collection of hyperparameter optimization problems as well as a sensor set selection application.


Scaling Gaussian Process Regression with Derivatives

arXiv.org Artificial Intelligence

Gaussian processes (GPs) with derivatives are useful in many applications, including Bayesian optimization, implicit surface reconstruction, and terrain reconstruction. Fitting a GP to function values and derivatives at $n$ points in $d$ dimensions requires linear solves and log determinants with an ${n(d+1) \times n(d+1)}$ positive definite matrix -- leading to prohibitive $\mathcal{O}(n^3d^3)$ computations for standard direct methods. We propose iterative solvers using fast $\mathcal{O}(nd)$ matrix-vector multiplications (MVMs), together with pivoted Cholesky preconditioning that cuts the iterations to convergence by several orders of magnitude, allowing for fast kernel learning and prediction. Our approaches, together with dimensionality reduction, enables Bayesian optimization with derivatives to scale to high-dimensional problems and large evaluation budgets.