Variance-Aware Sparse Linear Bandits

Dai, Yan, Wang, Ruosong, Du, Simon S.

arXiv.org Artificial Intelligence 

It is well-known that for sparse linear bandits, when ignori ng the dependency on sparsity which is much smaller than the ambient dimension, t he worst-case mini-max regret is null Θ null dT null where d is the ambient dimension and T is the number of rounds. On the other hand, in the benign setting where ther e is no noise and the action set is the unit sphere, one can use divide-and-con quer to achieve null O (1) regret, which is (nearly) independent of d and T . This bound naturally interpolates the regret bounds for the worst-case constant -variance regime (i.e., σ To achieve this variance-aware regret guarantee, we develop a general framework that converts any variance-aware linear bandit algorithm to a varia nce-aware algorithm for sparse linear bandits in a "black-box" manner. Specifica lly, we take two recent algorithms as black boxes to illustrate that the claimed bou nds indeed hold, where the first algorithm can handle unknown-variance cases and th e second one is more efficient. This paper studies the sparse linear stochastic bandit prob lem, which is a special case of linear stochastic bandits. In linear bandits ( Dani et al., 2008), the agent is facing a sequential decision-making problem lasting for T rounds. Dani et al. ( 2008) proved that the minimax optimal regret for linear bandits is null Θ(d T) when the noises are independent Gaussian random variables with means 0 and variances 1 and both θ In real-world applications such as recommendation systems, only a few features may be relevant despite a large candidate feature space. In other words, the high-dimensional linear regime may actually allow a low-dimensional structure.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found