Reviews: Nonstochastic Multiarmed Bandits with Unrestricted Delays

Neural Information Processing Systems 

The paper deals with algorithms and regret guarantees for the non-stochastic delayed reward bandit problem. The authors make three main contributions. For the setting of non-stochastic bandit problems with unknown, variable, but bounded delays the authors establish regret guarantees for the delayed EXP3 algorithm. These regret guarantees establish a conjecture of Cesa-Bianchi[2016]. For this setting the authors provide an algorithm that is a slight variant of delayed EXP3.