Appendices armedBandit

Neural Information Processing Systems 

Without loss of generality, we assume that a = 1 is the optimal arm. First, note that in the batch algorithm B-TS (Algorithm 1), we define θa(t) based on FB(t). In the first equality given FB(t), the random variable θ1(t) is independent of all other θj(t) and Eθa(t). The proof closely follows [Agrawal and Goyal, 2017, Theorem 1.1] and is adapted to the batchsetting. The difference between this argument and that of the proof closely follows [Agrawal and Goyal, 2017, Theorem 1.1] is that conditioning is until the last time the B-TS algorithm has queried a batch, i.e., B(t).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found