A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback

Dec-24-2025, 04:18:18 GMT–Neural Information Processing Systems

We present a modified tuning of the algorithm of Zimmert and Seldin [2020] for adversarial multiarmed bandits with delayed feedback, which in addition to the minimax optimal adversarial regret guarantee shown by Zimmert and Seldin [2020] simultaneously achieves a near-optimal regret guarantee in the stochastic setting with fixed delays.

best-of-both-world algorithm, name change, regret guarantee, (10 more...)

Neural Information Processing Systems

Dec-24-2025, 04:18:18 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.39)