Supplementary Materials A Proof of Theorem 2: Asymptotic Convergence of Robust Q-Learning