Appendices armedBandit

Feb-8-2026, 18:42:42 GMT–Neural Information Processing Systems

Without loss of generality, we assume that a = 1 is the optimal arm. First, note that in the batch algorithm B-TS (Algorithm 1), we define θa(t) based on FB(t). In the first equality given FB(t), the random variable θ1(t) is independent of all other θj(t) and Eθa(t). The proof closely follows [Agrawal and Goyal, 2017, Theorem 1.1] and is adapted to the batchsetting. The difference between this argument and that of the proof closely follows [Agrawal and Goyal, 2017, Theorem 1.1] is that conditioning is until the last time the B-TS algorithm has queried a batch, i.e., B(t).

definitionc, hba, inequality, (17 more...)

Neural Information Processing Systems

Feb-8-2026, 18:42:42 GMT

Conferences PDF

Add feedback

Duplicate Docs Excel Report

Title
56f0b515214a7ec9f08a4bbf9a56f7ba-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found