A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays

Neural Information Processing Systems 

We propose a new best-of-both-worlds algorithm for bandits with variably delayed feedback.