Supplementary Materials A Numerical Example on Convergence Bounds
–Neural Information Processing Systems
We use the following numerical experiment to further illustrate our finite-time bounds on the convergence of double Q-learning. In such an experiment, the optimal Q-function can be explicitly calculated and thus the learning errors can be tracked. We choose γ = 0 .8,α We prove Lemma 1 by induction. First, it is easy to justify that the initial case is satisfied, i.e., In this appendix, we will provide a detailed proof of Theorem 1.
Neural Information Processing Systems
Nov-15-2025, 06:19:11 GMT