Towards a Better Understanding of Representation Dynamics under TD-learning

Tang, Yunhao, Munos, Rémi

arXiv.org Artificial Intelligence 

Critical to representation learning has led to much empirical success to the accuracy of value predictions is the quality and is the core of many high-performing agents such as of state representations. In this work, we consider DQN (Mnih et al., 2013). A natural question ensues: can we the question: how does end-to-end TD-learning characterize the representation learned by such end-to-end impact the representation over time? Complementary updates? to prior work, we provide a set of analysis that sheds further light on the representation dynamics The answer to this question has been attempted by a number under TD-learning. We first show that of prior work, including the study of the convergence of endto-end when the environments are reversible, end-to-end TD-learning under the over-parameterized regimes, TD-learning strictly decreases the value approximation i.e., when the value functions are learned by very wide neural error over time. Under further assumptions networks (Cai et al., 2019; Zhang et al., 2020; Agazzi and on the environments, we can connect the Lu, 2022; Sirignano and Spiliopoulos, 2022); the study of representation dynamics with spectral decomposition TD-learning dynamics under smooth homogeneous function over the transition matrix. This latter finding approximation, e.g., with ReLU networks (Brandfonbrener establishes fitting multiple value functions from and Bruna, 2019); the study of representation dynamics under randomly generated rewards as a useful auxiliary TD-learning with restrictive assumptions on the weight task for representation learning, as we empirically parameter (Lyle et al., 2021). See Section 6 for an in-depth validate on both tabular and Atari game suites.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found