Supplementary material for Uncorrected least-squares temporal difference with lambda-return
November 15, 2019 Abstract Here, we provide a supplementary material for Takayuki Osogami, "Uncorrected least-squares temporal difference with lambda-return," which appears in Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI-20) [Osogami, 2019]. A Proofs In this section, we prove Theorem 1, Lemma 1, Theorem 2, Lemma 2, and Proposition 1. Note that equations (1)-(19) refers to those in Osogami [2019]. A.1 Proof of Theorem 1 From (7)-(8), we have the following equality: A Unc T 1 T null t 0φ t null φ t (1 λ) γ T t null m 1( λγ) m 1 φ t mnull null (20) T 1 null t 0φ t null φ t (1 λ) γ T t null m 1(λγ) m 1 φ t mnull null φ T φ null T (21) T 1 null t 0φ tnull φ t (1 λ) γ T t 1 null m 1(λγ) m 1 φ t m (1 λ) γ (λγ) T t 1 φ Tnull null φ T φ null T (22) A Unc T T 1 null t 0φ t(1 λ) γ (λγ) T t 1 φ null T φ T φ null T (23) A Unc T null T null t 0(λγ) T t φ tnull φ null T γ null T 1 null t 0( λγ) T t 1 φ tnull φ null T (24) A Unc T ( z T γ z T 1) φ null T . The recursive computation of the eligibility trace can be verified in a straightforward manner.
Nov-14-2019