Agent Societies
A Privacy-preserving Distributed Training Framework for Cooperative Multi-agent Deep Reinforcement Learning
Deep Reinforcement Learning (DRL) sometimes needs a large amount of data to converge in the training procedure and in some cases, each action of the agent may produce regret. This barrier naturally motivates different data sets or environment owners to cooperate to share their knowledge and train their agents more efficiently. However, it raises privacy concerns if we directly merge the raw data from different owners. To solve this problem, we proposed a new Deep Neural Network (DNN) architecture with both global NN and local NN, and a distributed training framework. We allow the global weights to be updated by all the collaborator agents while the local weights are only updated by the agent they belong to. In this way, we hope the global weighs can share the common knowledge among these collaborators while the local NN can keep the specialized properties and ensure the agent to be compatible with its specific environment. Experiments show that the framework can efficiently help agents in the same or similar environments to collaborate in their training process and gain a higher convergence rate and better performance.
Decentralized Graph-Based Multi-Agent Reinforcement Learning Using Reward Machines
Hu, Jueming, Xu, Zhe, Wang, Weichang, Qu, Guannan, Pang, Yutian, Liu, Yongming
In multi-agent reinforcement learning (MARL), it is challenging for a collection of agents to learn complex temporally extended tasks. The difficulties lie in computational complexity and how to learn the high-level ideas behind reward functions. We study the graph-based Markov Decision Process (MDP) where the dynamics of neighboring agents are coupled. We use a reward machine (RM) to encode each agent's task and expose reward function internal structures. RM has the capacity to describe high-level knowledge and encode non-Markovian reward functions. We propose a decentralized learning algorithm to tackle computational complexity, called decentralized graph-based reinforcement learning using reward machines (DGRM), that equips each agent with a localized policy, allowing agents to make decisions independently, based on the information available to the agents. DGRM uses the actor-critic structure, and we introduce the tabular Q-function for discrete state problems. We show that the dependency of Q-function on other agents decreases exponentially as the distance between them increases. Furthermore, the complexity of DGRM is related to the local information size of the largest $\kappa$-hop neighborhood, and DGRM can find an $O(\rho^{\kappa+1})$-approximation of a stationary point of the objective function. To further improve efficiency, we also propose the deep DGRM algorithm, using deep neural networks to approximate the Q-function and policy function to solve large-scale or continuous state problems. The effectiveness of the proposed DGRM algorithm is evaluated by two case studies, UAV package delivery and COVID-19 pandemic mitigation. Experimental results show that local information is sufficient for DGRM and agents can accomplish complex tasks with the help of RM. DGRM improves the global accumulated reward by 119% compared to the baseline in the case of COVID-19 pandemic mitigation.
Global cooperation on autonomous driving advancing sector-Ecns.cn
An autonomous bus has a test drive with passengers aboard in Qingdao, Shandong province on Sept 19. International carmakers are partnering with Chinese companies to tailor autonomous driving solutions for their vehicles sold in the world's largest vehicle market. Last week, the largest carmaker in the United States said it is investing $300 million in Chinese autonomous driving startup Momenta. The deal is expected to accelerate General Motors' development of self-driving technologies for its vehicles in China, said Julian Blissett, executive vice-president of GM and president of GM China. "Customers in China are embracing electrification and advanced self-driving technology faster than anywhere else in the world," Blissett said.
[ICML 2021 Spotlight] DFAC Framework: Factorizing the Value Function via Quantile Mixture for…
In multi-agent reinforcement learning (MARL), the environments are highly stochastic due to the partial observability of each agent and the continuously changing policies of the other agents. One of popular research directions is to enhance the training procedure of fully cooperative and decentralized agents. In the past few years, a number of MARL researchers turned their attention to centralized training with decentralized execution (CTDE). Among these CTDE approaches, value function factorization methods are especially promising in terms of their superior performances and data efficiency. Value function factorization methods introduce the assumption of individual-global-max (IGM) [1], which assumes that each agent's optimal actions result in the optimal joint actions of the entire group. Based on IGM, the total return of a group of agents can be factorized into separate utility functions for each agent.
Dimension-Free Rates for Natural Policy Gradient in Multi-Agent Reinforcement Learning
Alfano, Carlo, Rebeschini, Patrick
Cooperative multi-agent reinforcement learning is a decentralized paradigm in sequential decision making where agents distributed over a network iteratively collaborate with neighbors to maximize global (network-wide) notions of rewards. Exact computations typically involve a complexity that scales exponentially with the number of agents. To address this curse of dimensionality, we design a scalable algorithm based on the Natural Policy Gradient framework that uses local information and only requires agents to communicate with neighbors within a certain range. Under standard assumptions on the spatial decay of correlations for the transition dynamics of the underlying Markov process and the localized learning policy, we show that our algorithm converges to the globally optimal policy with a dimension-free statistical and computational complexity, incurring a localization error that does not depend on the number of agents and converges to zero exponentially fast as a function of the range of communication.
Amid Skepticism, Biden Vows a New Era of Global Collaboration
Joe Biden made his début at the elegant green-marble rostrum of the United Nations this week, as the coronavirus infected more than half a million people each day worldwide, as wildfires and floods aggravated by climate change ravaged the Earth, and as the U.S. struggled to prevent a new cold war with China. In lofty language, the President tried to redirect the world's focus away from the calamitous end to America's longest war, in Afghanistan, and a recent bust-up with its most longstanding ally, France. Just eight months into his Presidency, Biden is already trying to hit reset on his foreign policy. "I stand here today for the first time in twenty years with the United States not at war. We've turned the page," Biden told the chamber.
Towards Multi-Agent Reinforcement Learning using Quantum Boltzmann Machines
Müller, Tobias, Roch, Christoph, Schmid, Kyrill, Altmann, Philipp
Reinforcement learning has driven impressive advances in machine learning. Simultaneously, quantum-enhanced machine learning algorithms using quantum annealing underlie heavy developments. Recently, a multi-agent reinforcement learning (MARL) architecture combining both paradigms has been proposed. This novel algorithm, which utilizes Quantum Boltzmann Machines (QBMs) for Q-value approximation has outperformed regular deep reinforcement learning in terms of time-steps needed to converge. However, this algorithm was restricted to single-agent and small 2x2 multi-agent grid domains. In this work, we propose an extension to the original concept in order to solve more challenging problems. Similar to classic DQNs, we add an experience replay buffer and use different networks for approximating the target and policy values. The experimental results show that learning becomes more stable and enables agents to find optimal policies in grid-domains with higher complexity. Additionally, we assess how parameter sharing influences the agents behavior in multi-agent domains. Quantum sampling proves to be a promising method for reinforcement learning tasks, but is currently limited by the QPU size and therefore by the size of the input and Boltzmann machine.
Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning
Zohar, Roy, Mannor, Shie, Tennenholtz, Guy
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents. As environments grow in size, effective credit assignment becomes increasingly harder and often results in infeasible learning times. Still, in many real-world settings, there exist simplified underlying dynamics that can be leveraged for more scalable solutions. In this work, we exploit such locality structures effectively whilst maintaining global cooperation. We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Centralized Training Decentralized Execution paradigm. Additionally, we provide a direct reward decomposition method for finding these local rewards when only a global signal is provided. We test our method empirically, showing it scales well compared to other methods, significantly improving performance and convergence speed.
Regularize! Don't Mix: Multi-Agent Reinforcement Learning without Explicit Centralized Structures
Siu, Chapman, Traish, Jason, Da Xu, Richard Yi
We propose using regularization for Multi-Agent Reinforcement Learning rather than learning explicit cooperative structures called {\em Multi-Agent Regularized Q-learning} (MARQ). Many MARL approaches leverage centralized structures in order to exploit global state information or removing communication constraints when the agents act in a decentralized manner. Instead of learning redundant structures which is removed during agent execution, we propose instead to leverage shared experiences of the agents to regularize the individual policies in order to promote structured exploration. We examine several different approaches to how MARQ can either explicitly or implicitly regularize our policies in a multi-agent setting. MARQ aims to address these limitations in the MARL context through applying regularization constraints which can correct bias in off-policy out-of-distribution agent experiences and promote diverse exploration. Our algorithm is evaluated on several benchmark multi-agent environments and we show that MARQ consistently outperforms several baselines and state-of-the-art algorithms; learning in fewer steps and converging to higher returns.
Greedy UnMixing for Q-Learning in Multi-Agent Reinforcement Learning
Siu, Chapman, Traish, Jason, Da Xu, Richard Yi
This paper introduces Greedy UnMix (GUM) for cooperative multi-agent reinforcement learning (MARL). Greedy UnMix aims to avoid scenarios where MARL methods fail due to overestimation of values as part of the large joint state-action space. It aims to address this through a conservative Q-learning approach through restricting the state-marginal in the dataset to avoid unobserved joint state action spaces, whilst concurrently attempting to unmix or simplify the problem space under the centralized training with decentralized execution paradigm. We demonstrate the adherence to Q-function lower bounds in the Q-learning for MARL scenarios, and demonstrate superior performance to existing Q-learning MARL approaches as well as more general MARL algorithms over a set of benchmark MARL tasks, despite its relative simplicity compared with state-of-the-art approaches.