An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge

Feb-19-2023–arXiv.org Artificial Intelligence

The linear bandit (LB) problem [1] and the kernel bandit (KB) problem [2] are important paradigms for sequential decision making under uncertainty. They extend the multi-armed bandit (MAB) problem [3] by modeling the reward function with the side information of each arm provided as a feature vector. LB assumes the reward function is linear. KB extends LB to model non-linearity by assuming the reward function lies in the RKHS induced by a kernel. A recent line of work studies the non-stationary variants of LB and KB where the reward functions can vary over time subject to two main types of non-stationarity budgets: the number of changes and the total variation in the sequence of reward functions. A common algorithm design principle for adapting to non-stationarity is the principle of forgetting the past. It has been applied to the non-stationary MAB to design nearly minimax optimal algorithms [4, 5]. Similarly, the principle has been applied to the non-stationary LB [6-9] and the non-stationary KB [10, 11]. Recently, Zhao et al. [12] found an error in a key technical lemma by Cheung et al. [6] that affects the concentration bound of regression-based reward estimates under non-stationarity.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Feb-19-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.50)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks (0.93)
    - Representation & Reasoning (1.00)
  - Data Science > Data Mining
    - Big Data (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found