Why do Policy Gradient Methods work so well in Cooperative MARL? Evidence from Policy Representation

Open in new window