Agents
Asynchronous Policy Evaluation in Distributed Reinforcement Learning over Networks
Sha, Xingyu, Zhang, Jiaqi, Zhang, Kaiqing, You, Keyou, Başar, Tamer
This paper proposes a \emph{fully asynchronous} scheme for policy evaluation of distributed reinforcement learning (DisRL) over peer-to-peer networks. Without any form of coordination, nodes can communicate with neighbors and compute their local variables using (possibly) delayed information at any time, which is in sharp contrast to the asynchronous gossip. Thus, the proposed scheme fully takes advantage of the distributed setting. We prove that our method converges at a linear rate $\mathcal{O}(c^k)$ where $c\in(0,1)$ and $k$ increases by one no matter on which node updates, showing the computational advantage by reducing the amount of synchronization. Numerical experiments show that our method speeds up linearly w.r.t. the number of nodes, and is robust to straggler nodes. To the best of our knowledge, our work is the first theoretical analysis for asynchronous update in DisRL, including the \emph{parallel RL} domain advocated by A3C.
Army researchers enhance AI critical to Soldier-machine teamwork
Artificial intelligence possesses the capacity to achieve incredible results, but cannot always work alone. Researchers identified two key components in successful human-machine collaboration that may enhance how the U.S. Army will fight in the future. To achieve dominance in what is known as multi-domain operations, warfighters will need a layered intelligence, surveillance and reconnaissance, or ISR, network that maintains a functional relationship between autonomous sensors, human intelligence and friendly special operations forces. Multi-domain operations, known as MDO, is a joint warfighting concept that foresees conflict occurring in multiple domains: land, air, sea, cyber and space. The concept has many nuances, but basically describes how the Army, as part of the joint force, will solve the problem of layered standoff in all domains.
Reward Design for Driver Repositioning Using Multi-Agent Reinforcement Learning
A large portion of passenger requests is reportedly unserviced, partially due to vacant for-hire drivers' cruising behavior during the passenger seeking process. This paper aims to model the multi-driver repositioning task through a mean field multi-agent reinforcement learning (MARL) approach that captures competition among multiple agents. Because the direct application of MARL to the multi-driver system under a given reward mechanism will likely yield a suboptimal equilibrium due to the selfishness of drivers, this study proposes an reward design scheme with which a more desired equilibrium can be reached. To effectively solve the bilevel optimization problem with upper level as the reward design and the lower level as a multi-agent system, a Bayesian optimization (BO) algorithm is adopted to speed up the learning process. We then apply the bilevel optimization model to two case studies, namely, e-hailing driver repositioning under service charge and multiclass taxi driver repositioning under NYC congestion pricing. In the first case study, the model is validated by the agreement between the derived optimal control from BO and that from an analytical solution. With a simple piecewise linear service charge, the objective of the e-hailing platform can be increased by 4.0%. In the second case study, an optimal toll charge of $5.1 is solved using BO, which improves the objective of city planners by 7.9%, compared to that without any toll charge. Under this optimal toll charge, the number of taxis in the NYC central business district is decreased, indicating a better traffic condition, without substantially increasing the crowdedness of the subway system.
Corrupted Multidimensional Binary Search: Learning in the Presence of Irrational Agents
Krishnamurthy, Akshay, Lykouris, Thodoris, Podimata, Chara
Standard game-theoretic formulations for settings like contextual pricing and security games assume that agents act in accordance with a specific behavioral model. In practice however, some agents may not prescribe to the dominant behavioral model or may act in ways that are arbitrarily inconsistent. Existing algorithms heavily depend on the model being (approximately) accurate for all agents and have poor performance in the presence of even a few such arbitrarily irrational agents. How do we design learning algorithms that are robust to the presence of arbitrarily irrational agents? We address this question for a number of canonical game-theoretic applications by designing a robust algorithm for the fundamental problem of multidimensional binary search. The performance of our algorithm degrades gracefully with the number of corrupted rounds, which correspond to irrational agents and need not be known in advance. As binary search is the key primitive in algorithms for contextual pricing, Stackelberg Security Games, and other game-theoretic applications, we immediately obtain robust algorithms for these settings. Our techniques draw inspiration from learning theory, game theory, high-dimensional geometry, and convex analysis, and may be of independent algorithmic interest.
Learning to Resolve Alliance Dilemmas in Many-Player Zero-Sum Games
Hughes, Edward, Anthony, Thomas W., Eccles, Tom, Leibo, Joel Z., Balduzzi, David, Bachrach, Yoram
Zero-sum games have long guided artificial intelligence research, since they possess both a rich strategy space of best-responses and a clear evaluation metric. What's more, competition is a vital mechanism in many real-world multi-agent systems capable of generating intelligent innovations: Darwinian evolution, the market economy and the AlphaZero algorithm, to name a few. In two-player zero-sum games, the challenge is usually viewed as finding Nash equilibrium strategies, safeguarding against exploitation regardless of the opponent. While this captures the intricacies of chess or Go, it avoids the notion of cooperation with co-players, a hallmark of the major transitions leading from unicellular organisms to human civilization. Beyond two players, alliance formation often confers an advantage; however this requires trust, namely the promise of mutual cooperation in the face of incentives to defect. Successful play therefore requires adaptation to co-players rather than the pursuit of non-exploitability. Here we argue that a systematic study of many-player zero-sum games is a crucial element of artificial intelligence research. Using symmetric zero-sum matrix games, we demonstrate formally that alliance formation may be seen as a social dilemma, and empirically that na\"ive multi-agent reinforcement learning therefore fails to form alliances. We introduce a toy model of economic competition, and show how reinforcement learning may be augmented with a peer-to-peer contract mechanism to discover and enforce alliances. Finally, we generalize our agent model to incorporate temporally-extended contracts, presenting opportunities for further work.
C-CoCoA: A Continuous Cooperative Constraint Approximation Algorithm to Solve Functional DCOPs
Sarker, Amit, Arif, Abdullahil Baki, Choudhury, Moumita, Khan, Md. Mosaddek
Distributed Constraint Optimization Problems (DCOPs) have been widely used to coordinate interactions (i.e. constraints) in cooperative multi-agent systems. The traditional DCOP model assumes that variables owned by the agents can take only discrete values and constraints' cost functions are defined for every possible value assignment of a set of variables. While this formulation is often reasonable, there are many applications where the variables are continuous decision variables and constraints are in functional form. To overcome this limitation, Functional DCOP (F-DCOP) model is proposed that is able to model problems with continuous variables. The existing F-DCOPs algorithms experience huge computation and communication overhead. This paper applies continuous non-linear optimization methods on Cooperative Constraint Approximation (CoCoA) algorithm. We empirically show that our algorithm is able to provide high-quality solutions at the expense of smaller communication cost and execution time compared to the existing F-DCOP algorithms.
Learning Optimal Temperature Region for Solving Mixed Integer Functional DCOPs
Mahmud, Saaduddin, Khan, Md. Mosaddek, Choudhury, Moumita, Tran-Thanh, Long, Jennings, Nicholas R.
Distributed Constraint Optimization Problems (DCOPs) are an important framework that models coordinated decision-making problem in multi-agent systems with a set of discrete variables. Later work has extended this to model problems with a set of continuous variables (F-DCOPs). In this paper, we combine both of these models into the Mixed Integer Functional DCOP (MIF-DCOP) model that can deal with problems regardless of its variables' type. We then propose a novel algorithm, called Distributed Parallel Simulated Annealing (DPSA), where agents cooperatively learn the optimal parameter configuration for the algorithm while also solving the given problem using the learned knowledge. Finally, we empirically benchmark our approach in DCOP, F-DCOP and MIF-DCOP settings and show that DPSA produces solutions of significantly better quality than the state-of-the-art non-exact algorithms in their corresponding setting.
Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog
Thomason, Jesse (University of Washington) | Padmakumar, Aishwarya | Sinapov, Jivko | Walker, Nick | Jiang, Yuqian | Yedidsion, Harel | Hart, Justin | Stone, Peter | Mooney, Raymond
In this work, we present methods for using human-robot dialog to improve language understanding for a mobile robot agent. The agent parses natural language to underlying semantic meanings and uses robotic sensors to create multi-modal models of perceptual concepts like red and heavy. The agent can be used for showing navigation routes, delivering objects to people, and relocating objects from one location to another. We use dialog clarification questions both to understand commands and to generate additional parsing training data. The agent employs opportunistic active learning to select questions about how words relate to objects, improving its understanding of perceptual concepts. We evaluated this agent on Amazon Mechanical Turk. After training on data induced from conversations, the agent reduced the number of dialog questions it asked while receiving higher usability ratings. Additionally, we demonstrated the agent on a robotic platform, where it learned new perceptual concepts on the fly while completing a real-world task.
Review, Analyze, and Design a Comprehensive Deep Reinforcement Learning Framework
Nguyen, Ngoc Duy, Nguyen, Thanh Thi, Nguyen, Hai, Nahavandi, Saeid
Reinforcement learning (RL) has emerged as a standard approach for building an intelligent system, which involves multiple self-operated agents to collectively accomplish a designated task. More importantly, there has been a great attention to RL since the introduction of deep learning that essentially makes RL feasible to operate in high-dimensional environments. However, current research interests are diverted into different directions, such as multi-agent and multi-objective learning, and human-machine interactions. Therefore, in this paper, we propose a comprehensive software architecture that not only plays a vital role in designing a connect-the-dots deep RL architecture but also provides a guideline to develop a realistic RL application in a short time span. By inheriting the proposed architecture, software managers can foresee any challenges when designing a deep RL-based system. As a result, they can expedite the design process and actively control every stage of software development, which is especially critical in agile development environments. For this reason, we designed a deep RL-based framework that strictly ensures flexibility, robustness, and scalability. Finally, to enforce generalization, the proposed architecture does not depend on a specific RL algorithm, a network configuration, the number of agents, or the type of agents.
A Visual Communication Map for Multi-Agent Deep Reinforcement Learning
Nguyen, Ngoc Duy, Nguyen, Thanh Thi, Nahavandi, Saeid
Multi-agent learning distinctly poses significant challenges in the effort to allocate a concealed communication medium. Agents receive thorough knowledge from the medium to determine subsequent actions in a distributed nature. Apparently, the goal is to leverage the cooperation of multiple agents to achieve a designated objective efficiently. Recent studies typically combine a specialized neural network with reinforcement learning to enable communication between agents. This approach, however, limits the number of agents or necessitates the homogeneity of the system. In this paper, we have proposed a more scalable approach that not only deals with a great number of agents but also enables collaboration between dissimilar functional agents and compatibly combined with any deep reinforcement learning methods. Specifically, we create a global communication map to represent the status of each agent in the system visually. The visual map and the environmental state are fed to a shared-parameter network to train multiple agents concurrently. Finally, we select the Asynchronous Advantage Actor-Critic (A3C) algorithm to demonstrate our proposed scheme, namely Visual communication map for Multi-agent A3C (VMA3C). Simulation results show that the use of visual communication map improves the performance of A3C regarding learning speed, reward achievement, and robustness in multi-agent problems.