Weighted Linear Bandits for Non-Stationary Environments

Russac, Yoan, Vernade, Claire, Cappé, Olivier

Mar-19-2020, 01:32:52 GMT–Neural Information Processing Systems

We consider a stochastic linear bandit model in which the available actions correspond to arbitrary context vectors whose associated rewards follow a non-stationary linear regression model. In this setting, the unknown regression parameter is allowed to vary in time. To address this problem, we propose D-LinUCB, a novel optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past. This involves studying the deviations of the sequential weighted least-squares estimator under generic assumptions. As a by-product, we obtain novel deviation results that can be used beyond non-stationary environments.

d-linucb, non-stationary environment, weighted linear bandit

Neural Information Processing Systems

Mar-19-2020, 01:32:52 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)