On Proximal Policy Optimization's Heavy-tailed Gradients

Open in new window