Efficient kernelized bandit algorithms via exploration distributions
Hu, Bingshan, He, Zheng, Sutherland, Danica J.
–arXiv.org Artificial Intelligence
We consider a kernelized bandit problem with a compact arm set ${X} \subset \mathbb{R}^d $ and a fixed but unknown reward function $f^*$ with a finite norm in some Reproducing Kernel Hilbert Space (RKHS). We propose a class of computationally efficient kernelized bandit algorithms, which we call GP-Generic, based on a novel concept: exploration distributions. This class of algorithms includes Upper Confidence Bound-based approaches as a special case, but also allows for a variety of randomized algorithms. With careful choice of exploration distribution, our proposed generic algorithm realizes a wide range of concrete algorithms that achieve $\tilde{O}(γ_T\sqrt{T})$ regret bounds, where $γ_T$ characterizes the RKHS complexity. This matches known results for UCB- and Thompson Sampling-based algorithms; we also show that in practice, randomization can yield better practical results.
arXiv.org Artificial Intelligence
Jun-13-2025
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > Canada
- British Columbia (0.04)
- Europe > United Kingdom
- Genre:
- Research Report (0.84)
- Industry:
- Health & Medicine (0.37)
- Technology: