Near-Optimal Algorithm for Non-Stationary Kernelized Bandits

Iwazaki, Shogo, Takeno, Shion

arXiv.org Machine Learning 

Kernelized bandit (KB) problem [Srinivas et al., 2010], also called Gaussian process bandit or Bayesian optimization, is one of the important sequential decision-making problems where one seeks to minimize the regret under an unknown reward function via sequentially acquiring function evaluations. As the name suggests, in the KB problem, the underlying reward function is assumed to be an element of reproducing kernel Hilbert space (RKHS) induced by a known fixed kernel function. KB has been applied in many applications, such as materials discovery [Ueno et al., 2016], drug discovery [Korovina et al., 2020], and robotics [Berkenkamp et al., 2023]. In addition, the near-optimal KB algorithms, whose regret upper bound matches the regret lower bound derived in Scarlett et al. [2017], have been shown [Camilleri et al., 2021, Salgia et al., 2021, Li and Scarlett, 2022, Salgia et al., 2024]. Non-stationary KB [Bogunovic et al., 2016] considers the optimization under a non-stationary environment; that is, the reward function may change over time within some RKHS. This modification is crucial in many practical applications where an objective function varies over time, such as financial markets [Heaton and Lucas, 1999] and recommender systems [Hariri et al., 2015]. For example, Zhou and Shroff [2021], Deng et al. [2022] have proposed upper confidence bound (UCB)-based algorithms for the non-stationary KB problem and derived the upper bound of the cumulative regret. Recently, Hong et al. [2023] have proposed an optimization-based KB