multi-agent mujoco
A Proof
In Section 4.2, we have shown the effectiveness of In Section 3.4, we have analyzed that I2Q can easily solve the task with multiple optimal joint policies. Here, we give another way to solve this problem. D3G cannot obtain a winning rate in SMAC, as shown in Table 1. Although QSS value is a biased estimation in this implementation, the implementation without forward model is practical. The results are shown in Figure 16.
PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration
Li, Pengyi, Tang, Hongyao, Yang, Tianpei, Hao, Xiaotian, Sang, Tong, Zheng, Yan, Hao, Jianye, Taylor, Matthew E., Tao, Wenyuan, Wang, Zhen, Barez, Fazl
Learning to collaborate is critical in Multi-Agent Reinforcement Learning (MARL). Previous works promote collaboration by maximizing the correlation of agents' behaviors, which is typically characterized by Mutual Information (MI) in different forms. However, we reveal sub-optimal collaborative behaviors also emerge with strong correlations, and simply maximizing the MI can, surprisingly, hinder the learning towards better collaboration. To address this issue, we propose a novel MARL framework, called Progressive Mutual Information Collaboration (PMIC), for more effective MI-driven collaboration. PMIC uses a new collaboration criterion measured by the MI between global states and joint actions. Based on this criterion, the key idea of PMIC is maximizing the MI associated with superior collaborative behaviors and minimizing the MI associated with inferior ones. The two MI objectives play complementary roles by facilitating better collaborations while avoiding falling into sub-optimal ones. Experiments on a wide range of MARL benchmarks show the superior performance of PMIC compared with other algorithms.
- North America > Canada > Alberta (0.14)
- Asia > China > Tianjin Province > Tianjin (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent Reinforcement Learning
Su, Kefan, Zhou, Siyuan, Jiang, Jiechuan, Gan, Chuang, Wang, Xiangjun, Lu, Zongqing
Decentralized learning has shown great promise for cooperative multi-agent reinforcement learning (MARL). However, non-stationarity remains a significant challenge in fully decentralized learning. In the paper, we tackle the non-stationarity problem in the simplest and fundamental way and propose multi-agent alternate Q-learning (MA2QL), where agents take turns updating their Q-functions by Q-learning. MA2QL is a minimalist approach to fully decentralized cooperative MARL but is theoretically grounded. We prove that when each agent guarantees $\varepsilon$-convergence at each turn, their joint policy converges to a Nash equilibrium. In practice, MA2QL only requires minimal changes to independent Q-learning (IQL). We empirically evaluate MA2QL on a variety of cooperative multi-agent tasks. Results show MA2QL consistently outperforms IQL, which verifies the effectiveness of MA2QL, despite such minimal changes.
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control
de Witt, Christian Schroeder, Peng, Bei, Kamienny, Pierre-Alexandre, Torr, Philip, Böhmer, Wendelin, Whiteson, Shimon
Deep multi-agent reinforcement learning (MARL) holds the promise of automating many real-world cooperative robotic manipulation and transportation tasks. Nevertheless, decentralised cooperative robotic control has received less attention from the deep reinforcement learning community, as compared to single-agent robotics and multi-agent games with discrete actions. To address this gap, this paper introduces Multi-Agent Mujoco, an easily extensible multi-agent benchmark suite for robotic control in continuous action spaces. The benchmark tasks are diverse and admit easily configurable partially observable settings. Inspired by the success of single-agent continuous value-based algorithms in robotic control, we also introduce COMIX, a novel extension to a common discrete action multi-agent $Q$-learning algorithm. We show that COMIX significantly outperforms state-of-the-art MADDPG on a partially observable variant of a popular particle environment and matches or surpasses it on Multi-Agent Mujoco. Thanks to this new benchmark suite and method, we can now pose an interesting question: what is the key to performance in such settings, the use of value-based methods instead of policy gradients, or the factorisation of the joint $Q$-function? To answer this question, we propose a second new method, FacMADDPG, which factors MADDPG's critic. Experimental results on Multi-Agent Mujoco suggest that factorisation is the key to performance.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Montana (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > Denmark (0.04)