Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes

Sharma, Nihal, Sen, Rajat, Basu, Soumya, Shanmugam, Karthikeyan, Shakkottai, Sanjay

Oct-27-2024–arXiv.org Artificial Intelligence

Recommendation systems for suggesting items to users are commonplace in online services such as marketplaces, content delivery platforms and ad placement systems. Such systems, over time, learn from user feedback, and improve their recommendations. An important caveat, however, is that both the distribution of user types and their respective preferences change over time, thus inducing changes in the optimal recommendation and requiring the system to periodically "reset" its learning. We consider systems with known change-points (aka episodes) in the distribution of user-features and preferences. Examples include seasonality in product recommendations where there are marked changes in interests based on time-of-year, or ad-placements based on time-of-day. While a baseline strategy would be to re-learn the recommendation algorithm in each episode, it is often advantageous to share some learning across episodes. Specifically, one often has access to (potentially, a very) large number of pre-trained recommendation algorithms (aka experts), and the goal then is to quickly determine (in an online manner) which expert is best suited to a specific episode.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Oct-27-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.92)

Genre:
- Research Report > New Finding (0.67)

Industry:
- Marketing (0.92)
- Media > Film (0.46)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Representation & Reasoning > Personal Assistant Systems (1.00)
  - Data Science > Data Mining (1.00)