Nonstochastic Multiarmed Bandits with Unrestricted Delays

Thune, Tobias Sommer, Cesa-Bianchi, Nicolò, Seldin, Yevgeny

Jan-11-2020, 19:52:12 GMT–Neural Information Processing Systems

We investigate multiarmed bandits with delayed feedback, where the delays need neither be identical nor bounded. We first prove that "delayed" Exp3 achieves the $O(\sqrt{(KT D)\ln K})$ regret bound conjectured by Cesa-Bianchi et al. [2016] in the case of variable, but bounded delays. Here, $K$ is the number of actions and $D$ is the total delay over $T$ rounds. We then introduce a new algorithm that lifts the requirement of bounded delays by using a wrapper that skips rounds with excessively large delays. The new algorithm maintains the same regret bound, but similar to its predecessor requires prior knowledge of $D$ and $T$.

algorithm, nonstochastic multiarmed bandit, unrestricted delay, (6 more...)

Neural Information Processing Systems

Jan-11-2020, 19:52:12 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.72)