Reviews: Nonstochastic Multiarmed Bandits with Unrestricted Delays

Jan-21-2025, 13:25:24 GMT–Neural Information Processing Systems

The paper deals with algorithms and regret guarantees for the non-stochastic delayed reward bandit problem. The authors make three main contributions. For the setting of non-stochastic bandit problems with unknown, variable, but bounded delays the authors establish regret guarantees for the delayed EXP3 algorithm. These regret guarantees establish a conjecture of Cesa-Bianchi[2016]. For this setting the authors provide an algorithm that is a slight variant of delayed EXP3.

algorithm, nonstochastic multiarmed bandit, unrestricted delay, (7 more...)

Neural Information Processing Systems

Jan-21-2025, 13:25:24 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.37)