Coordinated Proximal Policy Optimization

Neural Information Processing Systems 

The key idea lies in the coordinated adaptation of step size during the policy update process among multiple agents.