On Regret with Multiple Best Arms

Oct-10-2024, 10:06:18 GMT–Neural Information Processing Systems

We study a regret minimization problem with the existence of multiple best/near-optimal arms in the multi-armed bandit setting. We consider the case when the number of arms/actions is comparable or much larger than the time horizon, and make no assumptions about the structure of the bandit instance. Our goal is to design algorithms that can automatically adapt to the unknown hardness of the problem, i.e., the number of best arms. Our setting captures many modern applications of bandit algorithms where the action space is enormous and the information about the underlying instance/structure is unavailable. We first propose an adaptive algorithm that is agnostic to the hardness level and theoretically derive its regret bound.

algorithm, hardness, multiple best arm, (2 more...)

Neural Information Processing Systems

Oct-10-2024, 10:06:18 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (0.43)
  - Data Science > Data Mining
    - Big Data (0.63)