OntheSuboptimalityofThompsonSamplinginHigh Dimensions
–Neural Information Processing Systems
We assume that(Z(t))t 1 are i.i.d., and thatZ1(t),...,Zd(t) are independent and distributed as Zi(t) Bernoulli(θi) for all t,i. Then the learner receives a rewardf(x(t),Z(t)) where f is a knownfunction.
Neural Information Processing Systems
Feb-8-2026, 10:55:54 GMT