Contents Appendix
–Neural Information Processing Systems
When the expected rewards of all arms are the same, we know that the arm with the lowest index will be chosen and thus the first K pulls will be π1 = 1,...,πK = K. We will complete the proof through induction. Suppose that the greedy pull sequence is periodic with π1 = 1,...,πK = K and πt+K = πt until time h>K. We will show that πh+1 = 1 if πh = K and πh+1 = πh + 1 otherwise. When k0 = 0 (i.e., πh = K), all arms have been pulled exactly ntimes as of time h. Therefore, by (3), at time h+ 1, arm 1 has the highest expected reward and will be chosen.
Neural Information Processing Systems
May-1-2026, 01:52:32 GMT
- Technology: