The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

Yu, Chao, Velu, Akash, Vinitsky, Eugene, Wang, Yu, Bayen, Alexandre, Wu, Yi

Mar-2-2021–arXiv.org Artificial Intelligence

Proximal Policy Optimization (PPO) is a popular on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent problems. In this work, we investigate Multi-Agent PPO (MAPPO), a multi-agent PPO variant which adopts a centralized value function. Using a 1-GPU desktop, we show that MAPPO achieves performance comparable to the state-of-the-art in three popular multi-agent testbeds: the Particle World environments, Starcraft II Micromanagement Tasks, and the Hanabi Challenge, with minimal hyperparameter tuning and without any domain-specific algorithmic modifications or architectures. In the majority of environments, we find that compared to off-policy baselines, MAPPO achieves better or comparable sample complexity as well as substantially faster running time. Finally, we present 5 factors most influential to MAPPO's practical performance with ablation studies.

computer game, mappo, us government, (21 more...)

arXiv.org Artificial Intelligence

Mar-2-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States (1.00)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Government > Regional Government
  - North America Government > United States Government (0.93)
- Leisure & Entertainment > Games (0.88)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning > Agents (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found