Non-Stationary Latent Bandits

Hong, Joey, Kveton, Branislav, Zaheer, Manzil, Chow, Yinlam, Ahmed, Amr, Ghavamzadeh, Mohammad, Boutilier, Craig

Dec-1-2020–arXiv.org Artificial Intelligence

Users of recommender systems often behave in a non-stationary fashion, due to their evolving preferences and tastes over time. In this work, we propose a practical approach for fast personalization to non-stationary users. The key idea is to frame this problem as a latent bandit, where the prototypical models of user behavior are learned offline and the latent state of the user is inferred online from its interactions with the models. We call this problem a non-stationary latent bandit. We propose Thompson sampling algorithms for regret minimization in non-stationary latent bandits, analyze them, and evaluate them on a real-world dataset. The main strength of our approach is that it can be combined with rich offline-learned models, which can be misspecified, and are subsequently fine-tuned online using posterior sampling. In this way, we naturally combine the strengths of offline and online learning.

artificial intelligence, data mining, latent state, (18 more...)

arXiv.org Artificial Intelligence

Dec-1-2020

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Industry:
- Education (0.34)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Representation & Reasoning > Personal Assistant Systems (0.87)
  - Data Science > Data Mining (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found