Robust Predictable Control

Neural Information Processing Systems 

Our method jointly optimizes a latent-space model and policy to be self-consistent, such that the policy avoids states where the model is inaccurate.