Weighted Linear Bandits for Non-Stationary Environments
Russac, Yoan, Vernade, Claire, Cappé, Olivier
–Neural Information Processing Systems
We consider a stochastic linear bandit model in which the available actions correspond to arbitrary context vectors whose associated rewards follow a non-stationary linear regression model. In this setting, the unknown regression parameter is allowed to vary in time. To address this problem, we propose D-LinUCB, a novel optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past. This involves studying the deviations of the sequential weighted least-squares estimator under generic assumptions. As a by-product, we obtain novel deviation results that can be used beyond non-stationary environments.
Neural Information Processing Systems
Mar-19-2020, 01:32:52 GMT
- Technology: