Goto

Collaborating Authors

 operation research









Bandits

Neural Information Processing Systems

Foreacharma, letr(a) and cj(a) be, resp., the meanrewardandmeanresource-j consumption,i.e.,(r(a);c1(a),..., cd(a)):=Eo Da[o].We sometimeswriter =( r(a): a 2 [K])andcj =( cj(a): a 2 [K])asvectorsoverarms. Second, weuseatighterversionof Eq. (3.6) (see AppendixD.3):



InformationDirectedSamplingforSparseLinear Bandits

Neural Information Processing Systems

We develop a class of informationtheoretic Bayesian regret bounds that nearly match existing lower bounds on a variety ofproblem instances, demonstrating theadaptivity ofIDS. Toefficiently implement sparse IDS, we propose an empirical Bayesian approach for sparse posterior sampling using a spike-and-slab Gaussian-Laplace prior. Numerical results demonstrate significant regretreductions bysparseIDSrelativetoseveral baselines.