A Proof of the Lower Bounds A.1 Lower Bound on the Exploration Cost: Proof of Theorem 1 Let us denote, for any model µ R KM and agent m,k m = arg max
–Neural Information Processing Systems
Assume that stopping time τ is almost surely finite under µ for algorithm A. Let event E Now arm k is optimal for agent m. For cumulative regret, the change-of-distribution lemma becomes an asymptotic result, stated below Lemma 6. Fix µ R The proof uses a change-of-distribution, following a technique proposed by [16]. The conclusion follows from some elementary real analysis to prove that the right hand side of the inequality is larger than (1 ε) log(T) for T large enough (how larger depends in a complex way of µ, λ, ε and the algorithm). At this point, we would really like to select the alternative model λ that leads to the tightest inequality in Lemma 6. However, this choice of alternative λ depends on T, hence we cannot apply Lemma 6, which is asymptotic in T and holds for a fixed λ.
Neural Information Processing Systems
May-21-2025, 18:07:57 GMT
- Technology: