Reviews: Bandit Learning with Implicit Feedback

Oct-8-2024, 05:58:29 GMT–Neural Information Processing Systems

Summary: This work considers learning user preferences using a bandit model. The reward is not only based on the judgement of the user, but also whether the user examined the arm. That is feedback examination * judgement In particular, if a user does not examine an arm, lack of feedback does not necessarily indicate that the user does not "like" the arm. This work uses a latent model for the (unobserved) examination of arms, and posits that the probability of positive feedback (binary) can be expressed as a product of the probability of examination (logistic) and positive feedback (logistic). The work proposes a VI approach to estimating the parameters, and then use a Thompson Sampling approach from the approximate posterior as policy. This allows them to use machinery from Russo and Van Roy to obtain regret bounds.

bandit learning, implicit feedback, synthetic data, (14 more...)

Neural Information Processing Systems

Oct-8-2024, 05:58:29 GMT

Conferences Web Page

Add feedback

Industry:
- Education (0.30)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.39)
  - Representation & Reasoning > Personal Assistant Systems (0.36)