Generalized Proximal Policy Optimization with Sample Reuse