Regret in Online Recommendation Systems

Neural Information Processing Systems 

This paper proposes a theoretical analysis of recommendation systems in an online setting, where items are sequentially recommended to users over time. In each round, a user, randomly picked from a population of m users, arrives. The decision-maker observes the user and selects an item from a catalogue of n items. Importantly, an item cannot be recommended twice to the same user. The probabilities that a user likes each item are unknown, and the performance of the recommendation algorithm is captured through its regret, considering as a reference an Oracle algorithm aware of these probabilities.