OntheSuboptimalityofThompsonSamplinginHigh Dimensions

Neural Information Processing Systems 

We assume that(Z(t))t 1 are i.i.d., and thatZ1(t),...,Zd(t) are independent and distributed as Zi(t) Bernoulli(θi) for all t,i. Then the learner receives a rewardf(x(t),Z(t)) where f is a knownfunction.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found