Learning Contextual Bandits in a Non-stationary Environment

Wu, Qingyun, Iyer, Naveen, Wang, Hongning

May-23-2018–arXiv.org Machine Learning

Multi-armed bandit algorithms have become a reference solution for handling the explore/exploit dilemma in recommender systems, and many other important real-world problems, such as display advertisement. However, such algorithms usually assume a stationary reward distribution, which hardly holds in practice as users' preferences are dynamic. This inevitably costs a recommender system consistent suboptimal performance. In this paper, we consider the situation where the underlying distribution of reward remains unchanged over (possibly short) epochs and shifts at unknown time instants. In accordance, we propose a contextual bandit algorithm that detects possible changes of environment based on its reward estimation confidence and updates its arm selection strategy respectively. Rigorous upper regret bound analysis of the proposed algorithm demonstrates its learning effectiveness in such a non-trivial environment. Extensive empirical evaluations on both synthetic and real-world datasets for recommendation confirm its practical utility in a changing environment.

data mining, machine learning, slave model, (19 more...)

arXiv.org Machine Learning

May-23-2018

arXiv.org PDF

Add feedback

Country:
- North America > United States > Virginia (0.28)

Genre:
- Research Report (0.50)

Industry:
- Media > Music (0.67)
- Leisure & Entertainment (0.67)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (1.00)
  - Artificial Intelligence
    - Representation & Reasoning > Personal Assistant Systems (1.00)
    - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found