Reviews: Online EXP3 Learning in Adversarial Bandits with Delayed Feedback

Neural Information Processing Systems 

However I have a major issue with their proposed algorithms which seems erroneous due to the choice of the learning rate (\eta) which requires the knowledge of the delays (d_t) -- but according to the problem formulation this is unknown to the learner. Then the whole technique seems to be pointless! The optimal learning rate seem to depend on the delays (d_t), e.g. Thm 1, Line-146 etc., but those are unknown to the learner. The claims of the paper stands vacuous if the proposed technique requires the knowledge of delays, where lies the major challenge of the problem addressed.