Nonstochastic Multiarmed Bandits with Unrestricted Delays

Oct-9-2024, 12:47:40 GMT–Neural Information Processing Systems

We investigate multiarmed bandits with delayed feedback, where the delays need neither be identical nor bounded. We first prove that "delayed" Exp3 achieves the O(\sqrt{(KT D)\ln K}) regret bound conjectured by Cesa-Bianchi et al. [2016] in the case of variable, but bounded delays. Here, K is the number of actions and D is the total delay over T rounds. We then introduce a new algorithm that lifts the requirement of bounded delays by using a wrapper that skips rounds with excessively large delays. The new algorithm maintains the same regret bound, but similar to its predecessor requires prior knowledge of D and T .

algorithm, nonstochastic multiarmed bandit, unrestricted delay, (6 more...)

Neural Information Processing Systems

Oct-9-2024, 12:47:40 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.69)