Multi-Agent Collaboration via Reward Attribution Decomposition
Zhang, Tianjun, Xu, Huazhe, Wang, Xiaolong, Wu, Yi, Keutzer, Kurt, Gonzalez, Joseph E., Tian, Yuandong
–arXiv.org Artificial Intelligence
Recent advances in multi-agent reinforcement learning (MARL) have achieved superhuman performance in games like Quake 3 and Dota 2. Unfortunately, these techniques require orders-of-magnitude more training rounds than humans and may not generalize to slightly altered environments or new agent configurations (i.e., ad hoc team play). In this work, we propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge and supports ad hoc team play. We first formulate multi-agent collaboration as a joint optimization on reward assignment and show that under certain conditions, each agent has a decentralized Q-function that is approximately optimal and can be decomposed into two terms: the self-term that only relies on the agent's own state, and the interactive term that is related to states of nearby agents, often observed by the current agent. The two terms are jointly trained using regular DQN, regulated with a Multi-Agent Reward Attribution (MARA) loss that ensures both terms retain their semantics. CollaQ is evaluated on various StarCraft maps, outperforming existing state-of-the-art techniques (i.e., QMIX, QTRAN, and VDN) by improving the win rate by 40% with the same number of environment steps. In the more challenging ad hoc team play setting (i.e., reweight/add/remove units without retraining or finetuning), CollaQ outperforms previous SoTA by over 30%. In recent years, multi-agent deep reinforcement learning (MARL) has drawn increasing interest from the research community. MARL algorithms have shown superhuman level performance in various games like Dota 2 (Berner et al., 2019), Quake 3 Arena (Jaderberg et al., 2019), and StarCraft (Samvelyan et al., 2019). However, the algorithms (Schulman et al., 2017; Mnih et al., 2013) are far less sample efficient than humans.
arXiv.org Artificial Intelligence
Oct-16-2020
- Country:
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- California
- Alameda County > Berkeley (0.04)
- San Diego County > San Diego (0.04)
- California
- Asia > Middle East
- Genre:
- Research Report > Promising Solution (0.34)
- Industry:
- Leisure & Entertainment > Games > Computer Games (0.76)
- Technology: