Reparameterization Proximal Policy Optimization