Reviews: Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
–Neural Information Processing Systems
Summary ----------------- The paper presents a novel actor-critic algorithm, named MADDPG, for both cooperative and competitive multiagent problems. MADDPG relies on a number of key ideas: 1) The action value functions are learned in a'centralized' manner, meaning that it takes into account the actions of all other players. This allows to evaluate the effect of the joint policy on each agents long term reward. To remove the need of knowing other agents' actions, the authors suggest that each agent could learn an approximate model of their policies. At each episode during the learning process, each agent draws uniformaly a policy from its ensemble.
Neural Information Processing Systems
Oct-7-2024, 23:22:34 GMT