PAC: AssistedValueFactorisationwithCounterfactual PredictionsinMulti-AgentReinforcementLearning

Neural Information Processing Systems 

To enable decentralized execution, we alsoderivefactorized per-agentpolicies inspired byamaximum-entropyMARL framework.