Supplementary Materials A Proof of Theorem 2: Asymptotic Convergence of Robust Q-Learning

Neural Information Processing Systems 

V null, (15) which is the expectation of the estimated update in line 5 of Algorithm 1. A.1 Robust Bellman operator is a contraction It was shown in [Iyengar, 2005, Roy et al., 2017] that the robust Bellman operator is a contraction. Here, for completeness, we include the proof for our R-contamination uncertainty set. In this section, we develop the finite-time analysis of the Algorithm 1. B.1 Notations We first introduce some notations. D. (44) Hence from the Bernstein inequality ([Li et al., 2020]), we have that |k This hence completes the proof.Lemma 4. F or any t T, |k In this section we prove Theorem 4. First note that for any x,y R In this section we develop the finite-time analysis of the robust TDC algorithm. For the convenience of proof, we add a projection step to the algorithm, i.e., we let θ The approach in [Kaledin et al., 2020] transforms the D.1 Lipschitz Smoothness In this section, we first show that J (θ) is Lipschitz.