Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

Oct-9-2024, 14:00:59 GMT–Neural Information Processing Systems

We study the structure of regret-minimizing policies in the {\em many-armed} Bayesian multi-armed bandit problem: in particular, with k the number of arms and T the time horizon, we consider the case where k \geq \sqrt{T} . We first show that {\em subsampling} is a critical step for designing optimal policies. In particular, the standard UCB algorithm leads to sub-optimal regret bounds in the many-armed regime. However, a subsampled UCB (SS-UCB), which samples \Theta(\sqrt{T}) arms and executes UCB only on that subset, is rate-optimal. Despite theoretically optimal regret, even SS-UCB performs poorly due to excessive exploration of suboptimal arms.

greedy algorithm, multi-armed bandit, unreasonable effectiveness, (4 more...)

Neural Information Processing Systems

Oct-9-2024, 14:00:59 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology
  - Artificial Intelligence (1.00)
  - Data Science > Data Mining
    - Big Data (1.00)