Improved Regret Analysis in Gaussian Process Bandits: Optimality for Noiseless Reward, RKHS norm, and Non-Stationary Variance

Feb-10-2025–arXiv.org Machine Learning

The Gaussian process (GP) bandits [Srinivas et al., 2010] is a powerful framework for sequential decision-making tasks to minimize regret defined by a black-box reward function, which belongs to known reproducing kernel Hilbert space (RKHS). The applications include many fileds such as robotics Berkenkamp et al. [2021], experimental design Lei et al. [2021], and hyperparameter tuning task Snoek et al. [2012]. Many existing studies have been conducted to obtain the theoretical guarantee for the regret. Establised work by Srinivas et al. [2010] has shown the upper bounds of the cumulative regret for the GP upper confidence bound (GP-UCB) algorithm. Furthermore, Valko et al. [2013] have shown the tighter regret upper bound for the SupKernelUCB algorithm. Scarlett et al. [2017] have shown the lower bound of the regret, which implies that the regret upper bound from [Valko et al., 2013] is near-optimal; that is, the regret upper bound matches the lower bound except for the poly-logarithmic factor. Then, several studies further tackled obtaining a near-optimal GP-bandit algorithm. Vakili et al. [2021] have proposed maximum variance reduction (MVR), which is shown to be near-optimal for the simple regret incurred by the last recommended action.

data mining, machine learning, reinforcement learning, (22 more...)

arXiv.org Machine Learning

Feb-10-2025

arXiv.org PDF

Add feedback

Country:
- North America > Canada > Alberta (0.14)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Transportation > Air (0.34)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Reinforcement Learning (0.34)
    - Representation & Reasoning (1.00)
  - Data Science > Data Mining
    - Big Data (0.48)