A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback

Neural Information Processing Systems 

We present a modified tuning of the algorithm of Zimmert and Seldin [2020] for adversarial multiarmed bandits with delayed feedback, which in addition to the minimax optimal adversarial regret guarantee shown by Zim-mert and Seldin simultaneously achieves a near-optimal regret guarantee in the stochastic setting with fixed delays.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found