Reviews: Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

Neural Information Processing Systems 

This paper describes a model-based reinforcement learning approach which is applied on 4 of the continuous control Mujoco tasks. The approach incorporates uncertainty in the forward dynamics model in two ways: by predicting a Gaussian distribution over future states, rather than a single point, and by training an ensemble of models using different subsets of the agent's experience. As a controller, the authors use the CEM method to generate action sequences, which are then used to generate state trajectories using the stochastic forward dynamics model. Reward sums are computed for each of the action-conditional trajectories, and the action corresponding to the highest predicted reward is executed. This is thus a form of model-predictive control. In their experiments, the authors show that their method is able to match the performance of SOTA model-free approaches using many fewer environment interactions, i.e. with improved sample complexity, for 3 out of 4 tasks.