Improved Regret Analysis in Gaussian Process Bandits: Optimality for Noiseless Reward, RKHS norm, and Non-Stationary Variance
The Gaussian process (GP) bandits [Srinivas et al., 2010] is a powerful framework for sequential decision-making tasks to minimize regret defined by a black-box reward function, which belongs to known reproducing kernel Hilbert space (RKHS). The applications include many fileds such as robotics Berkenkamp et al. [2021], experimental design Lei et al. [2021], and hyperparameter tuning task Snoek et al. [2012]. Many existing studies have been conducted to obtain the theoretical guarantee for the regret. Establised work by Srinivas et al. [2010] has shown the upper bounds of the cumulative regret for the GP upper confidence bound (GP-UCB) algorithm. Furthermore, Valko et al. [2013] have shown the tighter regret upper bound for the SupKernelUCB algorithm. Scarlett et al. [2017] have shown the lower bound of the regret, which implies that the regret upper bound from [Valko et al., 2013] is near-optimal; that is, the regret upper bound matches the lower bound except for the poly-logarithmic factor. Then, several studies further tackled obtaining a near-optimal GP-bandit algorithm. Vakili et al. [2021] have proposed maximum variance reduction (MVR), which is shown to be near-optimal for the simple regret incurred by the last recommended action.
Feb-10-2025
- Country:
- North America > Canada > Alberta (0.14)
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Transportation > Air (0.34)
- Technology: