AITopics | weighted linear bandit

Weighted Linear Bandits for Non-Stationary Environments

Neural Information Processing SystemsDec-25-2025, 03:51:29 GMT

We consider a stochastic linear bandit model in which the available actions correspond to arbitrary context vectors whose associated rewards follow a non-stationary linear regression model. In this setting, the unknown regression parameter is allowed to vary in time. To address this problem, we propose D-LinUCB, a novel optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past. This involves studying the deviations of the sequential weighted least-squares estimator under generic assumptions. As a by-product, we obtain novel deviation results that can be used beyond non-stationary environments. We provide theoretical guarantees on the behavior of D-LinUCB in both slowly-varying and abruptly-changing environments. We obtain an upper bound on the dynamic regret that is of order d B T is a measure of non-stationarity (d and T being, respectively, dimension and horizon). This rate is known to be optimal. We also illustrate the empirical performance of D-LinUCB and compare it with recently proposed alternatives in simulated environments.

name change, non-stationary environment, weighted linear bandit, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.84)

Add feedback

Weighted Linear Bandits for Non-Stationary Environments

Neural Information Processing SystemsOct-2-2025, 09:52:54 GMT

In this setting, the unknown regression parameter is allowed to vary in time.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.49)

Add feedback

Reviews: Weighted Linear Bandits for Non-Stationary Environments

Neural Information Processing SystemsJan-22-2025, 09:44:41 GMT

Update (after reading the rebuttals): After reading the rebuttal of authors, I have addressed my concerns on the novelty of the new self-normalized concentration, since the key point is that the coefficient of regularizer is changing. I indeed appreciate this work. The idea of this paper is natural but there indeed exist technical challenges, and the authors address these issues elegantly. So I think it deserves an acceptance. Nevertheless, there are still many typos in current verison besides those listed before, for example, in Theorem 2, eq.

bandit, concentration, self-normalized concentration, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.32)

Add feedback

Weighted Linear Bandits for Non-Stationary Environments

Neural Information Processing SystemsOct-9-2024, 17:20:23 GMT

We consider a stochastic linear bandit model in which the available actions correspond to arbitrary context vectors whose associated rewards follow a non-stationary linear regression model. In this setting, the unknown regression parameter is allowed to vary in time. To address this problem, we propose D-LinUCB, a novel optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past. This involves studying the deviations of the sequential weighted least-squares estimator under generic assumptions. As a by-product, we obtain novel deviation results that can be used beyond non-stationary environments.

d-linucb, non-stationary environment, weighted linear bandit

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.89)

Add feedback

Weighted Linear Bandits for Non-Stationary Environments

Russac, Yoan, Vernade, Claire, Cappé, Olivier

Neural Information Processing SystemsMar-19-2020, 01:32:52 GMT

We consider a stochastic linear bandit model in which the available actions correspond to arbitrary context vectors whose associated rewards follow a non-stationary linear regression model. In this setting, the unknown regression parameter is allowed to vary in time. To address this problem, we propose D-LinUCB, a novel optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past. This involves studying the deviations of the sequential weighted least-squares estimator under generic assumptions. As a by-product, we obtain novel deviation results that can be used beyond non-stationary environments.

d-linucb, non-stationary environment, weighted linear bandit

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)

Add feedback

Filters

Collaborating Authors

weighted linear bandit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Weighted Linear Bandits for Non-Stationary Environments

Weighted Linear Bandits for Non-Stationary Environments

Reviews: Weighted Linear Bandits for Non-Stationary Environments

Weighted Linear Bandits for Non-Stationary Environments

Weighted Linear Bandits for Non-Stationary Environments