Fighting Bandits with a New Kind of Smoothness

Mar-12-2024, 23:45:22 GMT–Neural Information Processing Systems

We provide a new analysis framework for the adversarial multi-armed bandit problem. Using the notion of convex smoothing, we define a novel family of algorithms with minimax optimal regret guarantees. First, we show that regularization via the Tsallis entropy, which includes EXP3 as a special case, matches the O(p NT) minimax regret with a smaller constant factor. Second, we show that a wide class of perturbation methods achieve a near-optimal regret as low as O(p NT log N), as long as the perturbation distribution has a bounded hazard function. For example, the Gumbel, Weibull, Frechet, Pareto, and Gamma distributions all satisfy this key property and lead to near-optimal algorithms.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Mar-12-2024, 23:45:22 GMT

Conferences PDF

Add feedback

Country:
- North America > United States
  - Michigan (0.05)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning (1.00)