Fighting Bandits with a New Kind of Smoothness

Oct-11-2024, 13:35:02 GMT–Neural Information Processing Systems

We focus on the adversarial multi-armed bandit problem. The EXP3 algorithm of Auer et al. (2003) was shown to have a regret bound of O(\sqrt{T N \log N}), where T is the time horizon and N is the number of available actions (arms). More recently, Audibert and Bubeck (2009) improved the bound by a logarithmic factor via an entirely different method. In the present work, we provide a new set of analysis tools, using the notion of convex smoothing, to provide several novel algorithms with optimal guarantees. First we show that regularization via the Tsallis entropy matches the minimax rate of Audibert and Bubeck (2009) with an even tighter constant; it also fully generalizes EXP3.

fighting bandit, new kind, smoothness, (3 more...)

Neural Information Processing Systems

Oct-11-2024, 13:35:02 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology
  - Artificial Intelligence (0.82)
  - Data Science > Data Mining
    - Big Data (1.00)