On Kernelized Multi-Armed Bandits with Constraints

Neural Information Processing Systems 

We study a stochastic bandit problem with a general unknown reward function and a general unknown constraint function. Both functions can be non-linear (even non-convex) and are assumed to lie in a reproducing kernel Hilbert space (RKHS) with a bounded norm.