xinit
Contents Appendix
When the expected rewards of all arms are the same, we know that the arm with the lowest index will be chosen and thus the first K pulls will be π1 = 1,...,πK = K. We will complete the proof through induction. Suppose that the greedy pull sequence is periodic with π1 = 1,...,πK = K and πt+K = πt until time h>K. We will show that πh+1 = 1 if πh = K and πh+1 = πh + 1 otherwise. When k0 = 0 (i.e., πh = K), all arms have been pulled exactly ntimes as of time h. Therefore, by (3), at time h+ 1, arm 1 has the highest expected reward and will be chosen.