A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback
–Neural Information Processing Systems
We present a modified tuning of the algorithm of Zimmert and Seldin [2020] for adversarial multiarmed bandits with delayed feedback, which in addition to the minimax optimal adversarial regret guarantee shown by Zim-mert and Seldin simultaneously achieves a near-optimal regret guarantee in the stochastic setting with fixed delays.
Neural Information Processing Systems
Aug-14-2025, 17:12:30 GMT
- Country:
- Europe
- Denmark > Capital Region
- Copenhagen (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Denmark > Capital Region
- Europe
- Technology: