Bandits

Neural Information Processing Systems 

Foreacharma, letr(a) and cj(a) be, resp., the meanrewardandmeanresource-j consumption,i.e.,(r(a);c1(a),..., cd(a)):=Eo Da[o].We sometimeswriter =( r(a): a 2 [K])andcj =( cj(a): a 2 [K])asvectorsoverarms. Second, weuseatighterversionof Eq. (3.6) (see AppendixD.3):

Similar Docs  Excel Report  more

TitleSimilaritySource
None found