Goto

Collaborating Authors

 glamor


cf708fc1decf0337aded484f8f4519ae-Supplemental.pdf

Neural Information Processing Systems

We found that training an inverse model is crucial for learning good representations. On the first row,alevel from each environment that one-shot PPGS fails tosolve(thewhitearrowsrepresent thepolicy). Iterative Model Improvement In general settings, collecting training trajectories by sampling actions uniformly atrandom does not grant sufficient coverage ofthe state space. GLAMORGLAMOR [34] learns inverse dynamics to achieve visual goals in Atari games. The only difference withPPGS in terms of settings is that we allowGLAMORto collect data on-policy and for more interactions (2M).



Planning from Pixels using Inverse Dynamics Models

arXiv.org Artificial Intelligence

Learning task-agnostic dynamics models in high-dimensional observation spaces can be challenging for model-based RL agents. We propose a novel way to learn latent world models by learning to predict sequences of future actions conditioned on task completion. These task-conditioned models adaptively focus modeling capacity on task-relevant dynamics, while simultaneously serving as an effective heuristic for planning with sparse rewards. We evaluate our method on challenging visual goal completion tasks and show a substantial increase in performance compared to prior model-free approaches. Deep reinforcement learning has proven to be a powerful and effective framework for solving a diversity of challenging decision-making problems (Silver et al., 2017a; Berner et al., 2019). However these algorithms are typically trained to maximize a single reward function, ignoring information that is not directly relevant to the associated task at hand. This way of learning is in stark contrast to how humans learn (Tenenbaum, 2018). Without being prompted by a specific task, humans can still explore their environment, practice achieving imaginary goals, and in so doing learn about the dynamics of the environment. When subsequently presented with a novel task, humans can utilize this learned knowledge to bootstrap learning -- a property we would like our artificial agents to have. In this work, we investigate one way to bridge this gap by learning world models (Ha & Schmidhuber, 2018) that enable the realization of previously unseen tasks. By modeling the task-agnostic dynamics of an environment, an agent can make predictions about how its own actions may affect the environment state without the need for additional samples from the environment. Prior work has shown that by using powerful function approximators to model environment dynamics, training an agent entirely within its own world models can result in large gains in sample efficiency (Ha & Schmidhuber, 2018).