Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms

Neural Information Processing Systems 

Based on our analysis, we propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.