Contextual Bandits under Delayed Feedback

Vernade, Claire, Carpentier, Alexandra, Zappella, Giovanni, Ermis, Beyza, Brueckner, Michael

Jul-5-2018–arXiv.org Machine Learning

Delayed feedback is an ubiquitous problem in many industrial systems employing bandit algorithms. Most of those systems seek to optimize binary indicators as clicks. In that case, when the reward is not sent immediately, the learner cannot distinguish a negative signal from a not-yet-sent positive one: she might be waiting for a feedback that will never come. In this paper, we define and address the contextual bandit problem with delayed and censored feedback by providing a new UCB-based algorithm. In order to demonstrate its effectiveness, we provide a finite time regret analysis and an empirical evaluation that compares it against a baseline commonly used in practice.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

Jul-5-2018

arXiv.org PDF

Add feedback

Country:
- Europe > Germany
  - Saxony-Anhalt > Magdeburg (0.04)
  - Berlin (0.04)

Genre:
- Research Report (0.64)

Industry:
- Law > Civil Rights & Constitutional Law (0.49)
- Information Technology > Services (0.46)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (0.88)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found