Open Problem: Tight Online Confidence Intervals for RKHS Elements

Vakili, Sattar, Scarlett, Jonathan, Javidi, Tara

Oct-28-2021–arXiv.org Machine Learning

Confidence intervals are a crucial building block in the analysis of various online learning problems. The analysis of kernel based bandit and reinforcement learning problems utilize confidence intervals applicable to the elements of a reproducing kernel Hilbert space (RKHS). However, the existing confidence bounds do not appear to be tight, resulting in suboptimal regret bounds. In fact, the existing regret bounds for several kernelized bandit algorithms (e.g., GP-UCB, GP-TS, and their variants) may fail to even be sublinear. It is unclear whether the suboptimal regret bound is a fundamental shortcoming of these algorithms or an artifact of the proof, and the main challenge seems to stem from the online (sequential) nature of the observation points.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Machine Learning

Oct-28-2021

arXiv.org PDF

Add feedback

Country:
- Asia (0.15)
- North America > United States
  - California (0.14)

Genre:
- Research Report (0.51)

Industry:
- Education > Focused Education > Special Education (0.46)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (0.49)