Contextual Bandits under Delayed Feedback

Vernade, Claire, Carpentier, Alexandra, Zappella, Giovanni, Ermis, Beyza, Brueckner, Michael

arXiv.org Machine Learning 

Delayed feedback is an ubiquitous problem in many industrial systems employing bandit algorithms. Most of those systems seek to optimize binary indicators as clicks. In that case, when the reward is not sent immediately, the learner cannot distinguish a negative signal from a not-yet-sent positive one: she might be waiting for a feedback that will never come. In this paper, we define and address the contextual bandit problem with delayed and censored feedback by providing a new UCB-based algorithm. In order to demonstrate its effectiveness, we provide a finite time regret analysis and an empirical evaluation that compares it against a baseline commonly used in practice.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found