Shaping Belief States with Generative Environment Models for RL Artificial Intelligence

When agents interact with a complex environment, they must form and maintain beliefs about the relevant aspects of that environment. We propose a way to efficiently train expressive generative models in complex environments. We show that a predictive algorithm with an expressive generative model can form stable belief-states in visually rich and dynamic 3D environments. More precisely, we show that the learned representation captures the layout of the environment as well as the position and orientation of the agent. Our experiments show that the model substantially improves data-efficiency on a number of reinforcement learning (RL) tasks compared with strong model-free baseline agents. We find that predicting multiple steps into the future (overshooting), in combination with an expressive generative model, is critical for stable representations to emerge. In practice, using expressive generative models in RL is computationally expensive and we propose a scheme to reduce this computational burden, allowing us to build agents that are competitive with model-free baselines.

A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning

Neural Information Processing Systems

This paper takes a step towards temporal reasoning in a dynamically changing video, not in the pixel space that constitutes its frames, but in a latent space that describes the non-linear dynamics of the objects in its world. We introduce the Kalman variational auto-encoder, a framework for unsupervised learning of sequential data that disentangles two latent representations: an object's representation, coming from a recognition model, and a latent state describing its dynamics. As a result, the evolution of the world can be imagined and missing data imputed, both without the need to generate high dimensional frames at each time step. The model is trained end-to-end on videos of a variety of simulated physical systems, and outperforms competing methods in generative and missing data imputation tasks.

Learning Latent Dynamics for Planning from Pixels Artificial Intelligence

Planning has been very successful for control tasks with known environment dynamics. To leverage planning in unknown environments, the agent needs to learn the dynamics from interactions with the world. However, learning dynamics models that are accurate enough for planning has been a long-standing challenge, especially in image-based domains. We propose the Deep Planning Network (PlaNet), a purely model-based agent that learns the environment dynamics from pixels and chooses actions through online planning in latent space. To achieve high performance, the dynamics model must accurately predict the rewards ahead for multiple time steps. We approach this problem using a latent dynamics model with both deterministic and stochastic transition function and a generalized variational inference objective that we name latent overshooting. Using only pixel observations, our agent solves continuous control tasks with contact dynamics, partial observability, and sparse rewards. PlaNet uses significantly fewer episodes and reaches final performance close to and sometimes higher than top model-free algorithms.

Unsupervised Learning of Sensorimotor Affordances by Stochastic Future Prediction Machine Learning

Recently, much progress has been made building systems that can capture static image properties, but natural environments are intrinsically dynamic. For an intelligent agent, perception is responsible not only for capturing features of scene content, but also capturing its \textit{affordances}: how the state of things can change, especially as the result of the agent's actions. We propose an unsupervised method to learn representations of the sensorimotor affordances of an environment. We do so by learning an embedding for stochastic future prediction that is (i) sensitive to scene dynamics and minimally sensitive to static scene content and (ii) compositional in nature, capturing the fact that changes in the environment can be composed to produce a cumulative change. We show that these two properties are sufficient to induce representations that are reusable across visually distinct scenes that share degrees of freedom. We show the applicability of our method to synthetic settings and its potential for understanding more complex, realistic visual settings.

Model-Free Episodic Control Machine Learning

State of the art deep reinforcement learning algorithms take many millions of interactions to attain human-level performance. Humans, on the other hand, can very quickly exploit highly rewarding nuances of an environment upon first discovery. In the brain, such rapid learning is thought to depend on the hippocampus and its capacity for episodic memory. Here we investigate whether a simple model of hippocampal episodic control can learn to solve difficult sequential decision-making tasks. We demonstrate that it not only attains a highly rewarding strategy significantly faster than state-of-the-art deep reinforcement learning algorithms, but also achieves a higher overall reward on some of the more challenging domains.