Reviews: Regularizing Trajectory Optimization with Denoising Autoencoders

Neural Information Processing Systems 

The paper addresses the problem of reducing the exploitation of inaccuracies of learned dynamics models by trajectory optimization algorithms in model-based Reinforcement Learning. For this, it proposes to add a regularizer to the optimization cost which writes as an estimation of the log probability (in a local window) of sampling the optimized trajectory from the distribution of known trajectories. The idea is to avoid trajectories deviating too much from the data used to learn the dynamics model, and hence avoid unreliable solutions. The authors propose to estimate the log probability term with a denoising autoencoder network. They provide multiple experiments comparing their method to other state-of-the-art approaches on known environments/datasets.