Blocked Collaborative Bandits: Online Collaborative Filtering with Per-Item Budget Constraints

Neural Information Processing Systems 

We consider the problem of blocked collaborative bandits where there are multiple users, each with an associated multi-armed bandit problem. These users are grouped into latent clusters such that the mean reward vectors of users within the same cluster are identical. Our goal is to design algorithms that maximize the cumulative reward accrued by all the users over time, under the constraint that no arm of a user is pulled more than B times. This problem has been originally considered by [4], and designing regret-optimal algorithms for it has since remained an open problem. In this work, we propose an algorithm called B-LATTICE (Blocked Latent bAndiTs via maTrIx ComplEtion) that collaborates across users, while simultaneously satisfying the budget constraints, to maximize their cumulative rewards.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found