Goto

Collaborating Authors

 Agent Societies


The Importance of Credo in Multiagent Learning

arXiv.org Artificial Intelligence

The recent We propose a model for multi-objective optimization, a credo, for call to make cooperation central to the development of AI places emphasis agents in a system that are configured into multiple groups (i.e., on understanding the mechanisms behind teamwork beyond teams). Our model of credo regulates how agents optimize their just competition [14, 15] and to adapt findings from Organizational behavior for the groups they belong to. We evaluate credo in the Psychology [5]. In MARL, agents learning to cooperate often build context of challenging social dilemmas with reinforcement learning common interest by sharing exogenous rewards [1, 7]; however, agents. Our results indicate that the interests of teammates, or the purely pro-social agents may not be possible when considering entire system, are not required to be fully aligned for achieving agents designed by different manufacturers or hybrid AI/human globally beneficial outcomes. We identify two scenarios without populations. Agents in these settings may have some self-interest full common interest that achieve high equality and significantly for personal goals; therefore, it is important to understand how and higher mean population rewards compared to when the interests when cooperation can be supported in systems where agents may of all agents are aligned.


Bi-level Latent Variable Model for Sample-Efficient Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Despite their potential in real-world applications, multi-agent reinforcement learning (MARL) algorithms often suffer from high sample complexity. To address this issue, we present a novel model-based MARL algorithm, BiLL (Bi-Level Latent Variable Model-based Learning), that learns a bi-level latent variable model from high-dimensional inputs. At the top level, the model learns latent representations of the global state, which encode global information relevant to behavior learning. At the bottom level, it learns latent representations for each agent, given the global latent representations from the top level. The model generates latent trajectories to use for policy learning. We evaluate our algorithm on complex multi-agent tasks in the challenging SMAC and Flatland environments. Our algorithm outperforms state-of-the-art model-free and model-based baselines in sample efficiency, including on two extremely challenging Super Hard SMAC maps.


Shepherding Heterogeneous Flocks: Overview and Prospect

arXiv.org Artificial Intelligence

The problem of guiding a flock of several autonomous agents using repulsion force exerted by a smaller number of agents is called the shepherding problem and has been attracting attention due to its potential engineering applications. Although several works propose methodologies for achieving the shepherding task in this context, most assume that sheep agents have the same dynamics, which only sometimes holds in reality. The objective of this discussion paper is to overview a recent research trend addressing the gap mentioned above between the commonly placed uniformity assumption and the reality. Specifically, we first introduce recent guidance methods for heterogeneous flocks and then describe the prospects of the shepherding problem for heterogeneous flocks.


On Local Rewards and Scaling Distributed Reinforcement Learning

Neural Information Processing Systems

We consider the scaling of the number of examples necessary to achieve good performance in distributed, cooperative, multi-agent reinforcement learning, as a function of the the number of agents n. We prove a worstcase lower bound showing that algorithms that rely solely on a global reward signal to learn policies confront a fundamental limit: They require a number of real-world examples that scales roughly linearly in the number of agents. For settings of interest with a very large number of agents, this is impractical. We demonstrate, however, that there is a class of algorithms that, by taking advantage of local reward signals in large distributed Markov Decision Processes, are able to ensure good performance with a number of samples that scales as O(log n). This makes them applicable even in settings with a very large number of agents n.


Emergent Coordination through Game-Induced Nonlinear Opinion Dynamics

arXiv.org Artificial Intelligence

We present a multi-agent decision-making framework for the emergent coordination of autonomous agents whose intents are initially undecided. Dynamic non-cooperative games have been used to encode multi-agent interaction, but ambiguity arising from factors such as goal preference or the presence of multiple equilibria may lead to coordination issues, ranging from the "freezing robot" problem to unsafe behavior in safety-critical events. The recently developed nonlinear opinion dynamics (NOD) provide guarantees for breaking deadlocks. However, choosing the appropriate model parameters automatically in general multi-agent settings remains a challenge. In this paper, we first propose a novel and principled procedure for synthesizing NOD based on the value functions of dynamic games conditioned on agents' intents. In particular, we provide for the two-player two-option case precise stability conditions for equilibria of the game-induced NOD based on the mismatch between agents' opinions and their game values. We then propose an optimization-based trajectory optimization algorithm that computes agents' policies guided by the evolution of opinions. The efficacy of our method is illustrated with a simulated toll station coordination example.


Adaptive parallelization of multi-agent simulations with localized dynamics

arXiv.org Artificial Intelligence

Agent-based modelling constitutes a versatile approach to representing and simulating complex systems. Studying large-scale systems is challenging because of the computational time required for the simulation runs: scaling is at least linear in system size (number of agents). Given the inherently modular nature of MABSs, parallel computing is a natural approach to overcoming this challenge. However, because of the shared information and communication between agents, parellelization is not simple. We present a protocol for shared-memory, parallel execution of MABSs. This approach is useful for models that can be formulated in terms of sequential computations, and that involve updates that are localized, in the sense of involving small numbers of agents. The protocol has a bottom-up and asynchronous nature, allowing it to deal with heterogeneous computation in an adaptive, yet graceful manner. We illustrate the potential performance gains on exemplar cultural dynamics and disease spreading MABSs.


Approximated Multi-Agent Fitted Q Iteration

arXiv.org Artificial Intelligence

We formulate an efficient approximation for multi-agent batch reinforcement learning, the approximated multi-agent fitted Q iteration (AMAFQI). We present a detailed derivation of our approach. We propose an iterative policy search and show that it yields a greedy policy with respect to multiple approximations of the centralized, learned Q-function. In each iteration and policy evaluation, AMAFQI requires a number of computations that scales linearly with the number of agents whereas the analogous number of computations increase exponentially for the fitted Q iteration (FQI), a commonly used approaches in batch reinforcement learning. This property of AMAFQI is fundamental for the design of a tractable multi-agent approach. We evaluate the performance of AMAFQI and compare it to FQI in numerical simulations. The simulations illustrate the significant computation time reduction when using AMAFQI instead of FQI in multi-agent problems and corroborate the similar performance of both approaches.


Effective and Stable Role-Based Multi-Agent Collaboration by Structural Information Principles

arXiv.org Artificial Intelligence

Role-based learning is a promising approach to improving the performance of Multi-Agent Reinforcement Learning (MARL). Nevertheless, without manual assistance, current role-based methods cannot guarantee stably discovering a set of roles to effectively decompose a complex task, as they assume either a predefined role structure or practical experience for selecting hyperparameters. In this article, we propose a mathematical Structural Information principles-based Role Discovery method, namely SIRD, and then present a SIRD optimizing MARL framework, namely SR-MARL, for multi-agent collaboration. The SIRD transforms role discovery into a hierarchical action space clustering. Specifically, the SIRD consists of structuralization, sparsification, and optimization modules, where an optimal encoding tree is generated to perform abstracting to discover roles. The SIRD is agnostic to specific MARL algorithms and flexibly integrated with various value function factorization approaches. Empirical evaluations on the StarCraft II micromanagement benchmark demonstrate that, compared with state-of-the-art MARL algorithms, the SR-MARL framework improves the average test win rate by 0.17%, 6.08%, and 3.24%, and reduces the deviation by 16.67%, 30.80%, and 66.30%, under easy, hard, and super hard scenarios.


Coordinating Fully-Cooperative Agents Using Hierarchical Learning Anticipation

arXiv.org Artificial Intelligence

Learning anticipation is a reasoning paradigm in multi-agent reinforcement learning, where agents, during learning, consider the anticipated learning of other agents. There has been substantial research into the role of learning anticipation in improving cooperation among self-interested agents in general-sum games. Two primary examples are Learning with Opponent-Learning Awareness (LOLA), which anticipates and shapes the opponent's learning process to ensure cooperation among self-interested agents in various games such as iterated prisoner's dilemma, and Look-Ahead (LA), which uses learning anticipation to guarantee convergence in games with cyclic behaviors. So far, the effectiveness of applying learning anticipation to fully-cooperative games has not been explored. In this study, we aim to research the influence of learning anticipation on coordination among common-interested agents. We first illustrate that both LOLA and LA, when applied to fully-cooperative games, degrade coordination among agents, causing worst-case outcomes. Subsequently, to overcome this miscoordination behavior, we propose Hierarchical Learning Anticipation (HLA), where agents anticipate the learning of other agents in a hierarchical fashion. Specifically, HLA assigns agents to several hierarchy levels to properly regulate their reasonings. Our theoretical and empirical findings confirm that HLA can significantly improve coordination among common-interested agents in fully-cooperative normal-form games. With HLA, to the best of our knowledge, we are the first to unlock the benefits of learning anticipation for fully-cooperative games.


Factorization of Multi-Agent Sampling-Based Motion Planning

arXiv.org Artificial Intelligence

Modern robotics often involves multiple embodied agents operating within a shared environment. Path planning in these cases is considerably more challenging than in single-agent scenarios. Although standard Sampling-based Algorithms (SBAs) can be used to search for solutions in the robots' joint space, this approach quickly becomes computationally intractable as the number of agents increases. To address this issue, we integrate the concept of factorization into sampling-based algorithms, which requires only minimal modifications to existing methods. During the search for a solution we can decouple (i.e., factorize) different subsets of agents into independent lower-dimensional search spaces once we certify that their future solutions will be independent of each other using a factorization heuristic. Consequently, we progressively construct a lean hypergraph where certain (hyper-)edges split the agents to independent subgraphs. In the best case, this approach can reduce the growth in dimensionality of the search space from exponential to linear in the number of agents. On average, fewer samples are needed to find high-quality solutions while preserving the optimality, completeness, and anytime properties of SBAs. We present a general implementation of a factorized SBA, derive an analytical gain in terms of sample complexity for PRM*, and showcase empirical results for RRG.