Learning to Predict Without Looking Ahead: World Models Without Forward Prediction

Neural Information Processing Systems 

In this work, we introduce a modification to traditional reinforcement learning which we call observational dropout, whereby we limit the agents ability to observe the real environment at each timestep.