Policy Regularization via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods
Hu, Jian, Hu, Siyue, Liao, Shih-wei
–arXiv.org Artificial Intelligence
Recent works have applied the Proximal Policy Optimization (PPO) to the multi-agent cooperative tasks, such as Independent PPO (IPPO); and vanilla Multi-agent PPO (MAPPO) which has a centralized value function. However, previous literature shows that MAPPO may not perform as well as Independent PPO (IPPO) and the Fine-tuned QMIX on Starcraft Multi-Agent Challenge (SMAC). MAPPO-Feature-Pruned (MAPPO-FP) improves the performance of MAPPO by the carefully designed agent-specific features, which may be not friendly to algorithmic utility. By contrast, we find that MAPPO may face the problem of \textit{The Policies Overfitting in Multi-agent Cooperation(POMAC)}, as they learn policies by the sampled advantage values. Then POMAC may lead to updating the multi-agent policies in a suboptimal direction and prevent the agents from exploring better trajectories. In this paper, to mitigate the multi-agent policies overfitting, we propose a novel policy regularization method, which disturbs the advantage values via random Gaussian noise. The experimental results show that our method outperforms the Fine-tuned QMIX, MAPPO-FP, and achieves SOTA on SMAC without agent-specific features. We open-source the code at \url{https://github.com/hijkzzz/noisy-mappo}.
arXiv.org Artificial Intelligence
Jun-8-2023
- Country:
- North America
- United States
- New York
- Richmond County > New York City (0.04)
- Queens County > New York City (0.04)
- New York County > New York City (0.04)
- Kings County > New York City (0.04)
- Bronx County > New York City (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California
- Los Angeles County > Long Beach (0.14)
- San Diego County > San Diego (0.04)
- New York
- Puerto Rico > San Juan
- San Juan (0.04)
- Canada > British Columbia
- United States
- Europe > Sweden
- Asia > Taiwan
- Taiwan Province > Taipei (0.04)
- North America
- Genre:
- Research Report > New Finding (0.66)
- Industry:
- Leisure & Entertainment > Games > Computer Games (0.34)