Supplementary Materials A Numerical Example on Convergence Bounds

Neural Information Processing Systems 

We use the following numerical experiment to further illustrate our finite-time bounds on the convergence of double Q-learning. In such an experiment, the optimal Q-function can be explicitly calculated and thus the learning errors can be tracked. We choose γ = 0 .8,α We prove Lemma 1 by induction. First, it is easy to justify that the initial case is satisfied, i.e., In this appendix, we will provide a detailed proof of Theorem 1.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found