Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes
Sharma, Nihal, Sen, Rajat, Basu, Soumya, Shanmugam, Karthikeyan, Shakkottai, Sanjay
–arXiv.org Artificial Intelligence
Recommendation systems for suggesting items to users are commonplace in online services such as marketplaces, content delivery platforms and ad placement systems. Such systems, over time, learn from user feedback, and improve their recommendations. An important caveat, however, is that both the distribution of user types and their respective preferences change over time, thus inducing changes in the optimal recommendation and requiring the system to periodically "reset" its learning. We consider systems with known change-points (aka episodes) in the distribution of user-features and preferences. Examples include seasonality in product recommendations where there are marked changes in interests based on time-of-year, or ad-placements based on time-of-day. While a baseline strategy would be to re-learn the recommendation algorithm in each episode, it is often advantageous to share some learning across episodes. Specifically, one often has access to (potentially, a very) large number of pre-trained recommendation algorithms (aka experts), and the goal then is to quickly determine (in an online manner) which expert is best suited to a specific episode.
arXiv.org Artificial Intelligence
Oct-27-2024