Multiscale Non-stationary Stochastic Bandits

Ding, Qin, Hsieh, Cho-Jui, Sharpnack, James

Feb-12-2020–arXiv.org Machine Learning

Classic contextual bandit algorithms for linear models, such as LinUCB, assume that the reward distribution for an arm is modeled by a stationary linear regression. When the linear regression model is non-stationary over time, the regret of LinUCB can scale linearly with time. In this paper, we propose a novel multiscale changepoint detection method for the non-stationary linear bandit problems, called Multiscale-LinUCB, which actively adapts to the changing environment. We also provide theoretical analysis of regret bound for Multiscale-LinUCB algorithm. Experimental results show that our proposed Multiscale-LinUCB algorithm outperforms other state-of-the-art algorithms in non-stationary contextual environments.

algorithm, changepoint, stationary period, (15 more...)

arXiv.org Machine Learning

Feb-12-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States > California
  - Los Angeles County > Los Angeles (0.14)
  - Yolo County > Davis (0.04)

Genre:
- Research Report > New Finding (0.66)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.87)
  - Artificial Intelligence > Machine Learning
    - Statistical Learning > Regression (0.74)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found