On Kernelized Multi-Armed Bandits with Constraints
–Neural Information Processing Systems
We study a stochastic bandit problem with a general unknown reward function and a general unknown constraint function. Both functions can be non-linear (even non-convex) and are assumed to lie in a reproducing kernel Hilbert space (RKHS) with a bounded norm.
Neural Information Processing Systems
Dec-23-2025, 16:37:47 GMT