CoordinatedProximalPolicyOptimization

Neural Information Processing Systems 

Insuch applications, a team of agents aim to maximize a joint expected utility through a single global reward.