High-dimensional Nonparametric Contextual Bandit Problem
Iwazaki, Shogo, Komiyama, Junpei, Imaizumi, Masaaki
We consider the kernelized contextual bandit problem with a large feature space. This problem involves $K$ arms, and the goal of the forecaster is to maximize the cumulative rewards through learning the relationship between the contexts and the rewards. It serves as a general framework for various decision-making scenarios, such as personalized online advertising and recommendation systems. Kernelized contextual bandits generalize the linear contextual bandit problem and offers a greater modeling flexibility. Existing methods, when applied to Gaussian kernels, yield a trivial bound of $O(T)$ when we consider $Ω(\log T)$ feature dimensions. To address this, we introduce stochastic assumptions on the context distribution and show that no-regret learning is achievable even when the number of dimensions grows up to the number of samples. Furthermore, we analyze lenient regret, which allows a per-round regret of at most $Δ> 0$. We derive the rate of lenient regret in terms of $Δ$.
May-21-2025
- Country:
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America
- United States
- New York (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Canada > British Columbia
- United States
- Europe > Italy
- Asia
- Middle East > Israel
- Haifa District > Haifa (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Middle East > Israel
- Oceania > Australia
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Marketing (0.88)
- Technology: