hierarchical gp
HyperBO+: Pre-training a universal prior for Bayesian optimization with hierarchical Gaussian processes
Fan, Zhou, Han, Xinran, Wang, Zi
Bayesian optimization (BO), while proved highly effective for many black-box function optimization tasks, requires practitioners to carefully select priors that well model their functions of interest. Rather than specifying by hand, researchers have investigated transfer learning based methods to automatically learn the priors, e.g. multi-task BO (Swersky et al., 2013), few-shot BO (Wistuba and Grabocka, 2021) and HyperBO (Wang et al., 2022). However, those prior learning methods typically assume that the input domains are the same for all tasks, weakening their ability to use observations on functions with different domains or generalize the learned priors to BO on different search spaces. In this work, we present HyperBO+: a pre-training approach for hierarchical Gaussian processes that enables the same prior to work universally for Bayesian optimization on functions with different domains. We propose a two-step pre-training method and analyze its appealing asymptotic properties and benefits to BO both theoretically and empirically. On real-world hyperparameter tuning tasks that involve multiple search spaces, we demonstrate that HyperBO+ is able to generalize to unseen search spaces and achieves lower regrets than competitive baselines.
Transfer Learning with Gaussian Processes for Bayesian Optimization
Tighineanu, Petru, Skubch, Kathrin, Baireuther, Paul, Reiss, Attila, Berkenkamp, Felix, Vinogradska, Julia
Bayesian optimization is a powerful paradigm to optimize black-box functions based on scarce and noisy data. Its data efficiency can be further improved by transfer learning from related tasks. While recent transfer models meta-learn a prior based on large amount of data, in the low-data regime methods that exploit the closed-form posterior of Gaussian processes (GPs) have an advantage. In this setting, several analytically tractable transfer-model posteriors have been proposed, but the relative advantages of these methods are not well understood. In this paper, we provide a unified view on hierarchical GP models for transfer learning, which allows us to analyze the relationship between methods. As part of the analysis, we develop a novel closed-form boosted GP transfer model that fits between existing approaches in terms of complexity. We evaluate the performance of the different approaches in large-scale experiments and highlight strengths and weaknesses of the different transfer-learning methods.
Structured Sparse Modelling with Hierarchical GP
Kuzin, Danil, Isupova, Olga, Mihaylova, Lyudmila
Sparse regression problems arise often in various applications, e.g., model selection, compressive sensing, EEG source localisation and gene modelling [1], [2]. One of the Bayesian approaches to force the coefficients being zeros is the spike and slab prior [3]: each component is modelled as a mixture of spike, that is the delta-function in zero, and slab, that is some vague distribution. Following the Bayesian approach, latent variables that are indicators of spikes are added to the model [4] and the relevant distribution is placed over them [5]. In this model each component is modelled to be spike or slab independently. However, in many applications nonzero elements tend to appear in groups forming an unknown structure: wavelet coefficients of images are usually organised in trees [6], chromosomes have a spatial structure along the genome [2]. We propose an extension of the spike and slab model by imposing a hierarchical Gaussian process (GP) prior on the latent variables. Such hierarchical prior allows to model spatial structural dependencies for coefficients that can evolve in time. The new model is flexible as spatial and temporal dependencies are decoupled by different levels of the hierarchical GP prior.