Comparator-adaptive Convex Bandits
van der Hoeven, Dirk, Cutkosky, Ashok, Luo, Haipeng
We study bandit convex optimization methods that adapt to the norm of the comparator, a topic that has only been studied before for its full-information counterpart. Specifically, we develop convex bandit algorithms with regret bounds that are small whenever the norm of the comparator is small. We first use techniques from the full-information setting to develop comparator-adaptive algorithms for linear bandits. Then, we extend the ideas to convex bandits with Lipschitz or smooth loss functions, using a new single-point gradient estimator and carefully designed surrogate losses.
Jul-16-2020
- Country:
- North America > United States
- California (0.14)
- Europe > Netherlands
- South Holland > Leiden (0.04)
- North America > United States
- Genre:
- Research Report > New Finding (0.46)
- Technology: