sparse linear bandit
InformationDirectedSamplingforSparseLinear Bandits
We develop a class of informationtheoretic Bayesian regret bounds that nearly match existing lower bounds on a variety ofproblem instances, demonstrating theadaptivity ofIDS. Toefficiently implement sparse IDS, we propose an empirical Bayesian approach for sparse posterior sampling using a spike-and-slab Gaussian-Laplace prior. Numerical results demonstrate significant regretreductions bysparseIDSrelativetoseveral baselines.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada (0.04)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
Information Directed Sampling for Sparse Linear Bandits
Stochastic sparse linear bandits offer a practical model for high-dimensional online decision-making problems and have a rich information-regret structure. In this work we explore the use of information-directed sampling (IDS), which naturally balances the information-regret trade-off. We develop a class of information-theoretic Bayesian regret bounds that nearly match existing lower bounds on a variety of problem instances, demonstrating the adaptivity of IDS. To efficiently implement sparse IDS, we propose an empirical Bayesian approach for sparse posterior sampling using a spike-and-slab Gaussian-Laplace prior. Numerical results demonstrate significant regret reductions by sparse IDS relative to several baselines.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada (0.04)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
Information Directed Sampling for Sparse Linear Bandits
Stochastic sparse linear bandits offer a practical model for high-dimensional online decision-making problems and have a rich information-regret structure. In this work we explore the use of information-directed sampling (IDS), which naturally balances the information-regret trade-off. We develop a class of information-theoretic Bayesian regret bounds that nearly match existing lower bounds on a variety of problem instances, demonstrating the adaptivity of IDS. To efficiently implement sparse IDS, we propose an empirical Bayesian approach for sparse posterior sampling using a spike-and-slab Gaussian-Laplace prior. Numerical results demonstrate significant regret reductions by sparse IDS relative to several baselines.
PopArt: Efficient Sparse Regression and Experimental Design for Optimal Sparse Linear Bandits
Jang, Kyoungseok, Zhang, Chicheng, Jun, Kwang-Sung
In sparse linear bandits, a learning agent sequentially selects an action and receive reward feedback, and the reward function depends linearly on a few coordinates of the covariates of the actions. This has applications in many real-world sequential decision making problems. In this paper, we propose a simple and computationally efficient sparse linear estimation method called PopArt that enjoys a tighter $\ell_1$ recovery guarantee compared to Lasso (Tibshirani, 1996) in many problems. Our bound naturally motivates an experimental design criterion that is convex and thus computationally efficient to solve. Based on our novel estimator and design criterion, we derive sparse linear bandit algorithms that enjoy improved regret upper bounds upon the state of the art (Hao et al., 2020), especially w.r.t. the geometry of the given action set. Finally, we prove a matching lower bound for sparse linear bandits in the data-poor regime, which closes the gap between upper and lower bounds in prior work.
- North America > United States > Arizona (0.04)
- Asia > Russia > Ural Federal District > Chelyabinsk Oblast > Chelyabinsk (0.04)
Variance-Aware Sparse Linear Bandits
Dai, Yan, Wang, Ruosong, Du, Simon S.
It is well-known that for sparse linear bandits, when ignori ng the dependency on sparsity which is much smaller than the ambient dimension, t he worst-case mini-max regret is null Θ null dT null where d is the ambient dimension and T is the number of rounds. On the other hand, in the benign setting where ther e is no noise and the action set is the unit sphere, one can use divide-and-con quer to achieve null O (1) regret, which is (nearly) independent of d and T . This bound naturally interpolates the regret bounds for the worst-case constant -variance regime (i.e., σ To achieve this variance-aware regret guarantee, we develop a general framework that converts any variance-aware linear bandit algorithm to a varia nce-aware algorithm for sparse linear bandits in a "black-box" manner. Specifica lly, we take two recent algorithms as black boxes to illustrate that the claimed bou nds indeed hold, where the first algorithm can handle unknown-variance cases and th e second one is more efficient. This paper studies the sparse linear stochastic bandit prob lem, which is a special case of linear stochastic bandits. In linear bandits ( Dani et al., 2008), the agent is facing a sequential decision-making problem lasting for T rounds. Dani et al. ( 2008) proved that the minimax optimal regret for linear bandits is null Θ(d T) when the noises are independent Gaussian random variables with means 0 and variances 1 and both θ In real-world applications such as recommendation systems, only a few features may be relevant despite a large candidate feature space. In other words, the high-dimensional linear regime may actually allow a low-dimensional structure.