Review for NeurIPS paper: On Regret with Multiple Best Arms

Jan-25-2025, 05:01:23 GMT–Neural Information Processing Systems

Summary and Contributions: The authors consider a stochastic bandit setting with n arms, where n is taken to be very large. Unlike other large-armed bandit work, there is no structure assumed between the payoff of the various arms. Their methods rely heavily on MOSS [Audibert et al.], which attains an optimal regret rate of \sqrt{nT} in the usual setting where n T. Letting m n denote the number of arms that are optimal (or near optimal), the authors begin by defining alpha satisfying n/m T {\alpha} as a key quantity characterizing the hardness. They observe that if alpha were known, one could sample a subset of the arms of cardinality T alpha log T, which contains an optimal arm whp. Applying MOSS attains a regret bound of O(T (1 alpha)/2).

algorithm, multiple best arm, neurips paper, (4 more...)

Neural Information Processing Systems

Jan-25-2025, 05:01:23 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.40)