Goto

Collaborating Authors

 optformer



cf6501108fced72ee5c47e2151c4e153-Paper-Conference.pdf

Neural Information Processing Systems

Thus, most meta and transfer-learning HPO methods [7-16] consider a restrictive setting where all tasks must share the same set of hyperparameters so that the input data can be represented as fixed-sizedvectors.



Towards Learning Universal Hyperparameter Optimizers with Transformers

Neural Information Processing Systems

Meta-learning hyperparameter optimization (HPO) algorithms from prior experiments is a promising approach to improve optimization efficiency over objective functions from a similar distribution. However, existing methods are restricted to learning from experiments sharing the same set of hyperparameters. In this paper, we introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction when trained on vast tuning data from the wild, such as Google's Vizier database, one of the world's largest HPO datasets. Our extensive experiments demonstrate that the OptFormer can simultaneously imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates. Compared to a Gaussian Process, the OptFormer also learns a robust prior distribution for hyperparameter response functions, and can thereby provide more accurate and better calibrated predictions. This work paves the path to future extensions for training a Transformer-based model as a general HPO optimizer.



BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL

Hung, Yu-Heng, Lin, Kai-Jie, Lin, Yu-Heng, Wang, Chien-Yi, Sun, Cheng, Hsieh, Ping-Chun

arXiv.org Artificial Intelligence

Bayesian optimization (BO) offers an efficient pipeline for optimizing black-box functions with the help of a Gaussian process prior and an acquisition function (AF). Recently, in the context of single-objective BO, learning-based AFs witnessed promising empirical results given its favorable non-myopic nature. Despite this, the direct extension of these approaches to multi-objective Bayesian optimization (MOBO) suffer from the \textit{hypervolume identifiability issue}, which results from the non-Markovian nature of MOBO problems. To tackle this, inspired by the non-Markovian RL literature and the success of Transformers in language modeling, we present a generalized deep Q-learning framework and propose \textit{BOFormer}, which substantiates this framework for MOBO via sequence modeling. Through extensive evaluation, we demonstrate that BOFormer constantly outperforms the benchmark rule-based and learning-based algorithms in various synthetic MOBO and real-world multi-objective hyperparameter optimization problems. We have made the source code publicly available to encourage further research in this direction.


Towards Learning Universal Hyperparameter Optimizers with Transformers

Neural Information Processing Systems

Meta-learning hyperparameter optimization (HPO) algorithms from prior experiments is a promising approach to improve optimization efficiency over objective functions from a similar distribution. However, existing methods are restricted to learning from experiments sharing the same set of hyperparameters. In this paper, we introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction when trained on vast tuning data from the wild, such as Google's Vizier database, one of the world's largest HPO datasets. Our extensive experiments demonstrate that the OptFormer can simultaneously imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates. Compared to a Gaussian Process, the OptFormer also learns a robust prior distribution for hyperparameter response functions, and can thereby provide more accurate and better calibrated predictions. This work paves the path to future extensions for training a Transformer-based model as a general HPO optimizer.


Multi-step Planning for Automated Hyperparameter Optimization with OptFormer

Dery, Lucio M., Friesen, Abram L., De Freitas, Nando, Ranzato, Marc'Aurelio, Chen, Yutian

arXiv.org Artificial Intelligence

As machine learning permeates more industries and models become more expensive and time consuming to train, the need for efficient automated hyperparameter optimization (HPO) has never been more pressing. Multi-step planning based approaches to hyperparameter optimization promise improved efficiency over myopic alternatives by more effectively balancing out exploration and exploitation. However, the potential of these approaches has not been fully realized due to their technical complexity and computational intensity. In this work, we leverage recent advances in Transformer-based, natural-language-interfaced hyperparameter optimization to circumvent these barriers. We build on top of the recently proposed OptFormer which casts both hyperparameter suggestion and target function approximation as autoregressive generation thus making planning via rollouts simple and efficient. We conduct extensive exploration of different strategies for performing multi-step planning on top of the OptFormer model to highlight its potential for use in constructing non-myopic HPO strategies.


Multi-step Planning for Automated Hyperparameter Optimization with OptFormer

#artificialintelligence

Unlike myopic HPO methods, planning based approaches fundamentally require building models of the future to assess the impact of a current decision on later timesteps. Though these methods also rely on a GP as a surrogate model, each point in multi-step planning involves fantasizing/imagining an updated GP posterior ( ft 1 xt),…,( ft h xt, xt 1,…, xt h 1) based on simulated choices from lookaheads {( xt, yt),…,( xt h 1, yt h 1)} (Lam et al., 2016; Jiang et al., 2020). Note that we use xt to represent a fantasized decision, while xt is the actual choice made at timestep t. Whilst multi-step planning is promising, constructing the posterior of a GP model requires matrix inversion which is a compute-intensive operation (Cormen et al., 2022). Even outside of this limitation, traditional planning based approaches are compute intensive due to (i) poor scaling behavior of the search tree--O(qh) where q is the number of choices at each decision point for each lookahead step (Lam et al., 2016; Lam and Willcox, 2017)--which forces most methods to explore short horizons, typically h {1,2}, and (ii) nested expectation and maximization: marginalizing future observation yt j,j h and global search on the acquisition function to obtain query xt j at every lookahead step.