Planning from Pixels using Inverse Dynamics Models
Paster, Keiran, McIlraith, Sheila A., Ba, Jimmy
–arXiv.org Artificial Intelligence
Learning task-agnostic dynamics models in high-dimensional observation spaces can be challenging for model-based RL agents. We propose a novel way to learn latent world models by learning to predict sequences of future actions conditioned on task completion. These task-conditioned models adaptively focus modeling capacity on task-relevant dynamics, while simultaneously serving as an effective heuristic for planning with sparse rewards. We evaluate our method on challenging visual goal completion tasks and show a substantial increase in performance compared to prior model-free approaches. Deep reinforcement learning has proven to be a powerful and effective framework for solving a diversity of challenging decision-making problems (Silver et al., 2017a; Berner et al., 2019). However these algorithms are typically trained to maximize a single reward function, ignoring information that is not directly relevant to the associated task at hand. This way of learning is in stark contrast to how humans learn (Tenenbaum, 2018). Without being prompted by a specific task, humans can still explore their environment, practice achieving imaginary goals, and in so doing learn about the dynamics of the environment. When subsequently presented with a novel task, humans can utilize this learned knowledge to bootstrap learning -- a property we would like our artificial agents to have. In this work, we investigate one way to bridge this gap by learning world models (Ha & Schmidhuber, 2018) that enable the realization of previously unseen tasks. By modeling the task-agnostic dynamics of an environment, an agent can make predictions about how its own actions may affect the environment state without the need for additional samples from the environment. Prior work has shown that by using powerful function approximators to model environment dynamics, training an agent entirely within its own world models can result in large gains in sample efficiency (Ha & Schmidhuber, 2018).
arXiv.org Artificial Intelligence
Dec-4-2020
- Country:
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America
- United States
- Massachusetts (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California > Los Angeles County
- Long Beach (0.14)
- Canada
- Ontario > Toronto (0.14)
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Alberta > Census Division No. 15
- Improvement District No. 9 > Banff (0.04)
- United States
- Europe > Sweden
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Oceania > Australia
- Genre:
- Research Report (0.64)
- Industry:
- Leisure & Entertainment (0.94)