qtran
Super Hard
We thank all the reviewers for their feedback. All reviewers are concerned whether we substantially outperform QMIX. Since StarCraft II experiments take a long time, we could not include all the results in the submission. Samvelyan et al. have classified as Easy, Hard & Super Hard. Results on several maps are shown below.
c97e7a5153badb6576d8939469f58336-Supplemental.pdf
Our initial experiments (implementation, debugging, hyperparameter tuning, etc.) required about 5000CPUhoursofcompute. Due to these rules, it is recommended to group together in order to attack simultaneously. In Warehouse[4], QTRAN makes slightly faster progress than VAST(η = 12). The results forWarehouse[16], Battle[80], and GaussianSqueeze[800] are shown in Figure 1. Figure 10: Visualizations of the generated sub-teams ofXMetaGrad with η = 14 and XSpatial with k-means clustering using 10 centroids at different stages (early, middle, late) inBattle[80] after training. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments.
Super Hard
We thank all the reviewers for their feedback. All reviewers are concerned whether we substantially outperform QMIX. Since StarCraft II experiments take a long time, we could not include all the results in the submission. Samvelyan et al. have classified as Easy, Hard & Super Hard. Results on several maps are shown below.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (3 more...)
Innate-Values-driven Reinforcement Learning for Cooperative Multi-Agent Systems
Innate values describe agents' intrinsic motivations, which reflect their inherent interests and preferences to pursue goals and drive them to develop diverse skills satisfying their various needs. The essence of reinforcement learning (RL) is learning from interaction based on reward-driven (such as utilities) behaviors, much like natural agents. It is an excellent model to describe the innate-values-driven (IV) behaviors of AI agents. Especially in multi-agent systems (MAS), building the awareness of AI agents to balance the group utilities and system costs and satisfy group members' needs in their cooperation is a crucial problem for individuals learning to support their community and integrate human society in the long term. This paper proposes a hierarchical compound intrinsic value reinforcement learning model -- innate-values-driven reinforcement learning termed IVRL to describe the complex behaviors of multi-agent interaction in their cooperation. We implement the IVRL architecture in the StarCraft Multi-Agent Challenge (SMAC) environment and compare the cooperative performance within three characteristics of innate value agents (Coward, Neutral, and Reckless) through three benchmark multi-agent RL algorithms: QMIX, IQL, and QTRAN. The results demonstrate that by organizing individual various needs rationally, the group can achieve better performance with lower costs effectively.
- North America > United States > Illinois > Peoria County > Peoria (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- (3 more...)
QTRAN++: Improved Value Transformation for Cooperative Multi-Agent Reinforcement Learning
Son, Kyunghwan, Ahn, Sungsoo, Reyes, Roben Delos, Shin, Jinwoo, Yi, Yung
QTRAN is a multi-agent reinforcement learning (MARL) algorithm capable of learning the largest class of joint-action value functions up to date. However, despite its strong theoretical guarantee, it has shown poor empirical performance in complex environments, such as Starcraft Multi-Agent Challenge (SMAC). In this paper, we identify the performance bottleneck of QTRAN and propose a substantially improved version, coined QTRAN++. Our gains come from (i) stabilizing the training objective of QTRAN, (ii) removing the strict role separation between the action-value estimators of QTRAN, and (iii) introducing a multi-head mixing network for value transformation. Through extensive evaluation, we confirm that our diagnosis is correct, and QTRAN++ successfully bridges the gap between empirical performance and theoretical guarantee. In particular, QTRAN++ newly achieves state-of-the-art performance in the SMAC environment. The code will be released.
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.82)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
QPLEX: Duplex Dueling Multi-Agent Q-Learning
Wang, Jianhao, Ren, Zhizhou, Liu, Terry, Yu, Yang, Zhang, Chongjie
We explore value-based multi-agent reinforcement learning (MARL) in the popular paradigm of centralized training with decentralized execution (CTDE). CTDE requires the consistency of the optimal joint action selection with optimal individual action selections, which is called the IGM (Individual-Global-Max) principle. However, in order to achieve scalability, existing MARL methods either limit representation expressiveness of their value function classes or relax the IGM consistency, which may lead to poor policies or even divergence. This paper presents a novel MARL approach, called duPLEX dueling multi-agent Q-learning (QPLEX), that takes a duplex dueling network architecture to factorize the joint value function. This duplex dueling architecture transforms the IGM principle to easily realized constraints on advantage functions and thus enables efficient value function learning. Theoretical analysis shows that QPLEX solves a rich class of tasks. Empirical experiments on StarCraft II unit micromanagement tasks demonstrate that QPLEX significantly outperforms state-of-the-art baselines in both online and offline task settings, and also reveal that QPLEX achieves high sample efficiency and can benefit from offline datasets without additional exploration.
QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning
Son, Kyunghwan, Kim, Daewoo, Kang, Wan Ju, Hostallero, David Earl, Yi, Yung
We explore value-based solutions for multi-agent reinforcement learning (MARL) tasks in the centralized training with decentralized execution (CTDE) regime popularized recently. However, VDN and QMIX are representative examples that use the idea of factorization of the joint action-value function into individual ones for decentralized execution. VDN and QMIX address only a fraction of factorizable MARL tasks due to their structural constraint in factorization such as additivity and monotonicity. In this paper, we propose a new factorization method for MARL, QTRAN, which is free from such structural constraints and takes on a new approach to transforming the original joint action-value function into an easily factorizable one, with the same optimal actions. QTRAN guarantees more general factorization than VDN or QMIX, thus covering a much wider class of MARL tasks than does previous methods. Our experiments for the tasks of multi-domain Gaussian-squeeze and modified predator-prey demonstrate QTRAN's superior performance with especially larger margins in games whose payoffs penalize non-cooperative behavior more aggressively.
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)