Goto

Collaborating Authors

 rosenbrock


a57ecd54d4df7d999bd9c5e3b973ec75-Supplemental.pdf

Neural Information Processing Systems

Wecanseethis as the slope of the update function changes (middle row of Figure 1), these green lines correspond tothelocations givenbythearrowsinthetoprow.





An $(\epsilon,\delta)$-accurate level set estimation with a stopping criterion

Ishibashi, Hideaki, Matsui, Kota, Kutsukake, Kentaro, Hino, Hideitsu

arXiv.org Machine Learning

The level set estimation problem seeks to identify regions within a set of candidate points where an unknown and costly to evaluate function's value exceeds a specified threshold, providing an efficient alternative to exhaustive evaluations of function values. Traditional methods often use sequential optimization strategies to find $\epsilon$-accurate solutions, which permit a margin around the threshold contour but frequently lack effective stopping criteria, leading to excessive exploration and inefficiencies. This paper introduces an acquisition strategy for level set estimation that incorporates a stopping criterion, ensuring the algorithm halts when further exploration is unlikely to yield improvements, thereby reducing unnecessary function evaluations. We theoretically prove that our method satisfies $\epsilon$-accuracy with a confidence level of $1 - \delta$, addressing a key gap in existing approaches. Furthermore, we show that this also leads to guarantees on the lower bounds of performance metrics such as F-score. Numerical experiments demonstrate that the proposed acquisition function achieves comparable precision to existing methods while confirming that the stopping criterion effectively terminates the algorithm once adequate exploration is completed.


Global Optimization with A Power-Transformed Objective and Gaussian Smoothing

Xu, Chen

arXiv.org Artificial Intelligence

We propose a novel method that solves global optimization problems in two steps: (1) perform a (exponential) power-$N$ transformation to the not-necessarily differentiable objective function $f$ and get $f_N$, and (2) optimize the Gaussian-smoothed $f_N$ with stochastic approximations. Under mild conditions on $f$, for any $\delta>0$, we prove that with a sufficiently large power $N_\delta$, this method converges to a solution in the $\delta$-neighborhood of $f$'s global optimum point. The convergence rate is $O(d^2\sigma^4\varepsilon^{-2})$, which is faster than both the standard and single-loop homotopy methods if $\sigma$ is pre-selected to be in $(0,1)$. In most of the experiments performed, our method produces better solutions than other algorithms that also apply smoothing techniques.


Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization

Xu, Zhitong, Zhe, Shandian

arXiv.org Artificial Intelligence

There has been a long-standing and widespread belief that Bayesian Optimization (BO) with standard Gaussian process (GP), referred to as standard BO, is ineffective in high-dimensional optimization problems. This perception may partly stem from the intuition that GPs struggle with high-dimensional inputs for covariance modeling and function estimation. While these concerns seem reasonable, empirical evidence supporting this belief is lacking. In this paper, we systematically investigated BO with standard GP regression across a variety of synthetic and real-world benchmark problems for high-dimensional optimization. Surprisingly, the performance with standard GP consistently ranks among the best, often outperforming existing BO methods specifically designed for high-dimensional optimization by a large margin. Contrary to the stereotype, we found that standard GP can serve as a capable surrogate for learning high-dimensional target functions. Without strong structural assumptions, BO with standard GP not only excels in high-dimensional optimization but also proves robust in accommodating various structures within the target functions. Furthermore, with standard GP, achieving promising optimization performance is possible by only using maximum likelihood estimation, eliminating the need for expensive Markov-Chain Monte Carlo (MCMC) sampling that might be required by more complex surrogate models. We thus advocate for a re-evaluation and in-depth study of the potential of standard BO in addressing high-dimensional problems.


Poisson Process for Bayesian Optimization

Wang, Xiaoxing, Li, Jiaxing, Xue, Chao, Liu, Wei, Liu, Weifeng, Yang, Xiaokang, Yan, Junchi, Tao, Dacheng

arXiv.org Artificial Intelligence

Bayesian Optimization (BO) is a sample-efficient black-box optimizer, and extensive methods have been proposed to build the absolute function response of the black-box function through a probabilistic surrogate model, including Tree-structured Parzen Estimator (TPE), random forest (SMAC), and Gaussian process (GP). However, few methods have been explored to estimate the relative rankings of candidates, which can be more robust to noise and have better practicality than absolute function responses, especially when the function responses are intractable but preferences can be acquired. To this end, we propose a novel rankingbased surrogate model based on the Poisson process and introduce an efficient BO framework, namely Poisson Process Bayesian Optimization (PoPBO). Two tailored acquisition functions are further derived from classic LCB and EI to accommodate it. Compared to the classic GP-BO method, our PoPBO has lower computation costs and better robustness to noise, which is verified by abundant experiments.


ELRA: Exponential learning rate adaption gradient descent optimization method

Kleinsorge, Alexander, Kupper, Stefan, Fauck, Alexander, Rothe, Felix

arXiv.org Artificial Intelligence

We present a novel, fast (exponential rate adaption), ab initio (hyper-parameter-free) gradient based optimizer algorithm. The main idea of the method is to adapt the learning rate $\alpha$ by situational awareness, mainly striving for orthogonal neighboring gradients. The method has a high success and fast convergence rate and does not rely on hand-tuned parameters giving it greater universality. It can be applied to problems of any dimensions n and scales only linearly (of order O(n)) with the dimension of the problem. It optimizes convex and non-convex continuous landscapes providing some kind of gradient. In contrast to the Ada-family (AdaGrad, AdaMax, AdaDelta, Adam, etc.) the method is rotation invariant: optimization path and performance are independent of coordinate choices. The impressive performance is demonstrated by extensive experiments on the MNIST benchmark data-set against state-of-the-art optimizers. We name this new class of optimizers after its core idea Exponential Learning Rate Adaption - ELRA. We present it in two variants c2min and p2min with slightly different control. The authors strongly believe that ELRA will open a completely new research direction for gradient descent optimize.


Bayesian Optimization with Informative Covariance

Eduardo, Afonso, Gutmann, Michael U.

arXiv.org Artificial Intelligence

Bayesian optimization is a methodology for global optimization of unknown and expensive objectives. It combines a surrogate Bayesian regression model with an acquisition function to decide where to evaluate the objective. Typical regression models are given by Gaussian processes with stationary covariance functions. However, these functions are unable to express prior input-dependent information, including possible locations of the optimum. The ubiquity of stationary models has led to the common practice of exploiting prior information via informative mean functions. In this paper, we highlight that these models can perform poorly, especially in high dimensions. We propose novel informative covariance functions for optimization, leveraging nonstationarity to encode preferences for certain regions of the search space and adaptively promote local exploration during optimization. We demonstrate that the proposed functions can increase the sample efficiency of Bayesian optimization in high dimensions, even under weak prior information.