Reviews: Nonstochastic Multiarmed Bandits with Unrestricted Delays
–Neural Information Processing Systems
The paper deals with algorithms and regret guarantees for the non-stochastic delayed reward bandit problem. The authors make three main contributions. For the setting of non-stochastic bandit problems with unknown, variable, but bounded delays the authors establish regret guarantees for the delayed EXP3 algorithm. These regret guarantees establish a conjecture of Cesa-Bianchi[2016]. For this setting the authors provide an algorithm that is a slight variant of delayed EXP3.
Neural Information Processing Systems
Jan-21-2025, 13:25:24 GMT
- Technology: