Agent Societies
Multi-Agent Advisor Q-Learning
Ganapathi Subramanian, Sriram (U Waterloo) | Taylor, Matthew E. (University of Alberta) | Larson, Kate (University of Waterloo) | Crowley, Mark (University of Waterloo)
In the last decade, there have been significant advances in multi-agent reinforcement learning (MARL) but there are still numerous challenges, such as high sample complexity and slow convergence to stable policies, that need to be overcome before wide-spread deployment is possible. However, many real-world environments already, in practice, deploy sub-optimal or heuristic approaches for generating policies. An interesting question that arises is how to best use such approaches as advisors to help improve reinforcement learning in multi-agent domains. In this paper, we provide a principled framework for incorporating action recommendations from online suboptimal advisors in multi-agent settings. We describe the problem of ADvising Multiple Intelligent Reinforcement Agents (ADMIRAL) in nonrestrictive general-sum stochastic game environments and present two novel Q-learning based algorithms: ADMIRAL - Decision Making (ADMIRAL-DM) and ADMIRAL - Advisor Evaluation (ADMIRAL-AE), which allow us to improve learning by appropriately incorporating advice from an advisor (ADMIRAL-DM), and evaluate the effectiveness of an advisor (ADMIRAL-AE). We analyze the algorithms theoretically and provide fixed point guarantees regarding their learning in general-sum stochastic games. Furthermore, extensive experiments illustrate that these algorithms: can be used in a variety of environments, have performances that compare favourably to other related baselines, can scale to large state-action spaces, and are robust to poor advice from advisors.
Strategic Maneuver and Disruption with Reinforcement Learning Approaches for Multi-Agent Coordination
Asher, Derrik E., Basak, Anjon, Fernandez, Rolando, Sharma, Piyush K., Zaroukian, Erin G., Hsu, Christopher D., Dorothy, Michael R., Mahre, Thomas, Galindo, Gerardo, Frerichs, Luke, Rogers, John, Fossaceca, John
Reinforcement learning (RL) approaches can illuminate emergent behaviors that facilitate coordination across teams of agents as part of a multi-agent system (MAS), which can provide windows of opportunity in various military tasks. Technologically advancing adversaries pose substantial risks to a friendly nation's interests and resources. Superior resources alone are not enough to defeat adversaries in modern complex environments because adversaries create standoff in multiple domains against predictable military doctrine-based maneuvers. Therefore, as part of a defense strategy, friendly forces must use strategic maneuvers and disruption to gain superiority in complex multi-faceted domains such as multi-domain operations (MDO). One promising avenue for implementing strategic maneuver and disruption to gain superiority over adversaries is through coordination of MAS in future military operations. In this paper, we present overviews of prominent works in the RL domain with their strengths and weaknesses for overcoming the challenges associated with performing autonomous strategic maneuver and disruption in military contexts.
Cluster Assignment in Multi-Agent Systems
Abstract--We study cluster assignment in multi-agent networks. The process of reaching an agreement between agents is In this work we focus on homogeneous networks, that is one of the fundamental tasks for a multi-agent system (MAS). The problem we aim to solve is how to design graphs computation [1], robotics [2], biochemical systems [3], and that ensure the networked system will converge to a prescribed sensor networks [4]. A natural extension to the agreement cluster configuration, i.e., specifying the number of clusters problem is the cluster agreement problem, which seeks to and the number of agents within each cluster. Employing tools drive agents into groups. All the agents within the same group from group theory, we show that it is possible to design an should then reach an agreement.
Social nucleation: Group formation as a phase transition
Schweitzer, Frank, Andres, Georges
The spontaneous formation and subsequent growth, dissolution, merger and competition of social groups bears similarities to physical phase transitions in metastable finite systems. We examine three different scenarios, percolation, spinodal decomposition and nucleation, to describe the formation of social groups of varying size and density. In our agent-based model, we use a feedback between the opinions of agents and their ability to establish links. Groups can restrict further link formation, but agents can also leave if costs exceed the group benefits. We identify the critical parameters for costs/benefits and social influence to obtain either one large group or the stable coexistence of several groups with different opinions. Analytic investigations allow to derive different critical densities that control the formation and coexistence of groups. Our novel approach sheds new light on the early stage of network growth and the emergence of large connected components.
Scalable Online Planning for Multi-Agent MDPs
Choudhury, Shushman (Lacuna) | Gupta, Jayesh K. (Microsoft) | Morales, Peter (Microsoft) | Kochenderfer, Mykel J. (Stanford University)
We present a scalable tree search planning algorithm for large multi-agent sequential decision problems that require dynamic collaboration. Teams of agents need to coordinate decisions in many domains, but naive approaches fail due to the exponential growth of the joint action space with the number of agents. We circumvent this complexity through an approach that allows us to trade computation for approximation quality and dynamically coordinate actions. Our algorithm comprises three elements: online planning with Monte Carlo Tree Search (MCTS), factored representations of local agent interactions with coordination graphs, and the iterative Max-Plus method for joint action selection. We evaluate our approach on the benchmark SysAdmin domain with static coordination graphs and achieve comparable performance with much lower computation cost than our MCTS baselines. We also introduce a multi-drone delivery domain with dynamic coordination graphs, and demonstrate how our approach scales to large problems on this domain that are intractable for other MCTS methods.
Fully Decentralized, Scalable Gaussian Processes for Multi-Agent Federated Learning
Kontoudis, George P., Stilwell, Daniel J.
In this paper, we propose decentralized and scalable algorithms for Gaussian process (GP) training and prediction in multi-agent systems. To decentralize the implementation of GP training optimization algorithms, we employ the alternating direction method of multipliers (ADMM). A closed-form solution of the decentralized proximal ADMM is provided for the case of GP hyper-parameter training with maximum likelihood estimation. Multiple aggregation techniques for GP prediction are decentralized with the use of iterative and consensus methods. In addition, we propose a covariance-based nearest neighbor selection strategy that enables a subset of agents to perform predictions. The efficacy of the proposed methods is illustrated with numerical experiments on synthetic and real data.
DEC-LOS-RRT: Decentralized Path Planning for Multi-robot Systems with Line-of-sight Constrained Communication
Tuck, Victoria, Pant, Yash Vardhan, Seshia, Sanjit A., Sastry, S. Shankar
Decentralized planning for multi-agent systems, such as fleets of robots in a search-and-rescue operation, is often constrained by limitations on how agents can communicate with each other. One such limitation is the case when agents can communicate with each other only when they are in line-of-sight (LOS). Developing decentralized planning methods that guarantee safety is difficult in this case, as agents that are occluded from each other might not be able to communicate until it's too late to avoid a safety violation. In this paper, we develop a decentralized planning method that explicitly avoids situations where lack of visibility of other agents would lead to an unsafe situation. Building on top of an existing Rapidly-exploring Random Tree (RRT)-based approach, our method guarantees safety at each iteration. Simulation studies show the effectiveness of our method and compare the degradation in performance with respect to a clairvoyant decentralized planning algorithm where agents can communicate despite not being in LOS of each other.
Understanding Value Decomposition Algorithms in Deep Cooperative Multi-Agent Reinforcement Learning
Dou, Zehao, Kuba, Jakub Grudzien, Yang, Yaodong
Value function decomposition is becoming a popular rule of thumb for scaling up multi-agent reinforcement learning (MARL) in cooperative games. For such a decomposition rule to hold, the assumption of the individual-global max (IGM) principle must be made; that is, the local maxima on the decomposed value function per every agent must amount to the global maximum on the joint value function. This principle, however, does not have to hold in general. As a result, the applicability of value decomposition algorithms is concealed and their corresponding convergence properties remain unknown. In this paper, we make the first effort to answer these questions. Specifically, we introduce the set of cooperative games in which the value decomposition methods find their validity, which is referred as decomposable games. In decomposable games, we theoretically prove that applying the multi-agent fitted Q-Iteration algorithm (MA-FQI) will lead to an optimal Q-function. In non-decomposable games, the estimated Q-function by MA-FQI can still converge to the optimum under the circumstance that the Q-function needs projecting into the decomposable function space at each iteration. In both settings, we consider value function representations by practical deep neural networks and derive their corresponding convergence rates. To summarize, our results, for the first time, offer theoretical insights for MARL practitioners in terms of when value decomposition algorithms converge and why they perform well.
Cooperative Solutions to Exploration Tasks Under Speed and Budget Constraints
We present a multi-agent system where agents can cooperate to solve a system of dependent tasks, with agents having the capability to explore a solution space, make inferences, as well as query for information under a limited budget. Re-exploration of the solution space takes place by an agent when an older solution expires and is thus able to adapt to dynamic changes in the environment. We investigate the effects of task dependencies, with highly-dependent graph $G_{40}$ (a well-known program graph that contains $40$ highly interlinked nodes, each representing a task) and less-dependent graphs $G_{18}$ (a program graph that contains $18$ tasks with fewer links), increasing the speed of the agents and the complexity of the problem space and the query budgets available to agents. Specifically, we evaluate trade-offs between the agent's speed and query budget. During the experiments, we observed that increasing the speed of a single agent improves the system performance to a certain point only, and increasing the number of faster agents may not improve the system performance due to task dependencies. Favoring faster agents during budget allocation enhances the system performance, in line with the "Matthew effect." We also observe that allocating more budget to a faster agent gives better performance for a less-dependent system, but increasing the number of faster agents gives a better performance for a highly-dependent system.
Answer Set Planning: A Survey
Son, Tran Cao, Pontelli, Enrico, Balduccini, Marcello, Schaub, Torsten
Answer Set Planning refers to the use of Answer Set Programming (ASP) to compute plans, i.e., solutions to planning problems, that transform a given state of the world to another state. The development of efficient and scalable answer set solvers has provided a significant boost to the development of ASP-based planning systems. This paper surveys the progress made during the last two and a half decades in the area of answer set planning, from its foundations to its use in challenging planning domains. The survey explores the advantages and disadvantages of answer set planning. It also discusses typical applications of answer set planning and presents a set of challenges for future research.