Rich-Observation Reinforcement Learning with Continuous Latent Dynamics
Song, Yuda, Wu, Lili, Foster, Dylan J., Krishnamurthy, Akshay
–arXiv.org Artificial Intelligence
It is becoming increasingly common to deploy algorithms for reinforcement learning and control in systems where the underlying ("latent") dynamics are nonlinear, continuous, and low-dimensional, yet the agent perceives the environment through high-dimensional ("rich") observations such as images from a camera (Wahlström et al., 2015; Levine et al., 2016; Kumar et al., 2021; Nair et al., 2023; Baker et al., 2022; Brohan et al., 2022). These domains demand that agents (i) efficiently explore in the face of complex nonlinearities, and (ii) learn continuous representations that respect the structure of the latent dynamics, ideally in tandem with exploration. In spite of extensive empirical investigation into modeling and algorithm design (Laskin et al., 2020; Yarats et al., 2021a; Hafner et al., 2023), sample-efficiency and reliability remain major challenges (Dean et al., 2020), and our understanding of fundamental algorithmic principles for representation learning and exploration is still in its infancy. Toward understanding algorithmic principles and fundamental limits for reinforcement learning and control with high-dimensional observations, a recent line of theoretical research adopts the framework of richobservation reinforcement learning (c.f., Du et al., 2019; Misra et al., 2020; Mhammedi et al., 2020; Zhang et al., 2022; Mhammedi et al., 2023b). Rich-observation RL provides a mathematical framework for the design and analysis of algorithms that perform exploration in the presence of high-dimensional observations, with an emphasis on generalization and sample-efficiency. However, existing work in this domain is largely restricted to systems with discrete ("tabular") latent dynamics, which is unsuitable for most real-world control applications.
arXiv.org Artificial Intelligence
May-29-2024