Appendices armedBandit
–Neural Information Processing Systems
Without loss of generality, we assume that a = 1 is the optimal arm. First, note that in the batch algorithm B-TS (Algorithm 1), we define θa(t) based on FB(t). In the first equality given FB(t), the random variable θ1(t) is independent of all other θj(t) and Eθa(t). The proof closely follows [Agrawal and Goyal, 2017, Theorem 1.1] and is adapted to the batchsetting. The difference between this argument and that of the proof closely follows [Agrawal and Goyal, 2017, Theorem 1.1] is that conditioning is until the last time the B-TS algorithm has queried a batch, i.e., B(t).
Neural Information Processing Systems
Feb-8-2026, 18:42:42 GMT