Goto

Collaborating Authors

 Agent Societies


A Game Benchmark for Real-Time Human-Swarm Control

arXiv.org Artificial Intelligence

We present a game benchmark for testing human-swarm control algorithms and interfaces in a real-time, high-cadence scenario. Our benchmark consists of a swarm vs. swarm game in a virtual ROS environment in which the goal of the game is to capture all agents from the opposing swarm; the game's high-cadence is a result of the capture rules, which cause agent team sizes to fluctuate rapidly. These rules require players to consider both the number of agents currently at their disposal and the behavior of their opponent's swarm when they plan actions. We demonstrate our game benchmark with a default human-swarm control system that enables a player to interact with their swarm through a high-level touchscreen interface. The touchscreen interface transforms player gestures into swarm control commands via a low-level decentralized ergodic control framework. We compare our default human-swarm control system to a flocking-based control system, and discuss traits that are crucial for swarm control algorithms and interfaces operating in real-time, high-cadence scenarios like our game benchmark. Our game benchmark code is available on Github; more information can be found at https://sites.google.com/view/swarm-game-benchmark.


Non-Linear Coordination Graphs

arXiv.org Artificial Intelligence

Value decomposition multi-agent reinforcement learning methods learn the global value function as a mixing of each agent's individual utility functions. Coordination graphs (CGs) represent a higher-order decomposition by incorporating pairwise payoff functions and thus is supposed to have a more powerful representational capacity. However, CGs decompose the global value function linearly over local value functions, severely limiting the complexity of the value function class that can be represented. In this paper, we propose the first non-linear coordination graph by extending CG value decomposition beyond the linear case. One major challenge is to conduct greedy action selections in this new function class to which commonly adopted DCOP algorithms are no longer applicable. We study how to solve this problem when mixing networks with LeakyReLU activation are used. An enumeration method with a global optimality guarantee is proposed and motivates an efficient iterative optimization method with a local optimality guarantee. We find that our method can achieve superior performance on challenging multi-agent coordination tasks like MACO.


Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

We investigate the use of natural language to drive the generalization of policies in multi-agent settings. Unlike single-agent settings, the generalization of policies should also consider the influence of other agents. Besides, with the increasing number of entities in multi-agent settings, more agent-entity interactions are needed for language grounding, and the enormous search space could impede the learning process. Moreover, given a simple general instruction, e.g., beating all enemies, agents are required to decompose it into multiple subgoals and figure out the right one to focus on. Inspired by previous work, we try to address these issues at the entity level and propose a novel framework for language grounding in multi-agent reinforcement learning, entity divider (EnDi). EnDi enables agents to independently learn subgoal division at the entity level and act in the environment based on the associated entities. The subgoal division is regularized by opponent modeling to avoid subgoal conflicts and promote coordinated strategies. Empirically, EnDi demonstrates the strong generalization ability to unseen games with new dynamics and expresses the superiority over existing methods.


Conditional Goal-oriented Trajectory Prediction for Interacting Vehicles with Vectorized Representation

arXiv.org Artificial Intelligence

This paper aims to tackle the interactive behavior prediction task, and proposes a novel Conditional Goal-oriented Trajectory Prediction (CGTP) framework to jointly generate scene-compliant trajectories of two interacting agents. Our CGTP framework is an end to end and interpretable model, including three main stages: context encoding, goal interactive prediction and trajectory interactive prediction. First, a Goals-of-Interest Network (GoINet) is designed to extract the interactive features between agent-to-agent and agent-to-goals using a graph-based vectorized representation. Further, the Conditional Goal Prediction Network (CGPNet) focuses on goal interactive prediction via a combined form of marginal and conditional goal predictors. Finally, the Goaloriented Trajectory Forecasting Network (GTFNet) is proposed to implement trajectory interactive prediction via the conditional goal-oriented predictors, with the predicted future states of the other interacting agent taken as inputs. In addition, a new goal interactive loss is developed to better learn the joint probability distribution over goal candidates between two interacting agents. In the end, the proposed method is conducted on Argoverse motion forecasting dataset, In-house cut-in dataset, and Waymo open motion dataset. The comparative results demonstrate the superior performance of our proposed CGTP model than the mainstream prediction methods.


Swarm Analytics: Designing Information Markers to Characterise Swarm Systems in Shepherding Contexts

arXiv.org Artificial Intelligence

Contemporary swarm indicators are often used in isolation, focused on extracting information at the individual or collective levels. Consequently, these are seldom integrated to infer a top-level operating picture of the swarm, its members, and its overall collective dynamics. The primary contribution of this paper is to organise a suite of indicators about swarms into an ontologically-arranged collection of information markers to characterise the swarm from the perspective of an external observer\textemdash, a recognition agent. Our contribution shows the foundations for a new area of research that we tile swarm analytics, whose primary concern is with the design and organisation of collections of swarm markers to understand, detect, recognise, track, and learn a particular insight about a swarm system. We present our designed framework of information markers that offer a new avenue for swarm research, especially for heterogeneous and cognitive swarms that may require more advanced capabilities to detect agencies and categorise agent influences and responses.


Learning Control Admissibility Models with Graph Neural Networks for Multi-Agent Navigation

arXiv.org Artificial Intelligence

Deep reinforcement learning in continuous domains focuses on learning control policies that map states to distributions over actions that ideally concentrate on the optimal choices in each step. In multi-agent navigation problems, the optimal actions depend heavily on the agents' density. Their interaction patterns grow exponentially with respect to such density, making it hard for learning-based methods to generalize. We propose to switch the learning objectives from predicting the optimal actions to predicting sets of admissible actions, which we call control admissibility models (CAMs), such that they can be easily composed and used for online inference for an arbitrary number of agents. We design CAMs using graph neural networks and develop training methods that optimize the CAMs in the standard model-free setting, with the additional benefit of eliminating the need for reward engineering typically required to balance collision avoidance and goal-reaching requirements. We evaluate the proposed approach in multi-agent navigation environments. We show that the CAM models can be trained in environments with only a few agents and be easily composed for deployment in dense environments with hundreds of agents, achieving better performance than state-of-the-art methods.


Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL

arXiv.org Artificial Intelligence

The cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications. Unfortunately, the theoretical understanding of this MARL problem is lacking due to the curse of many agents and the limited exploration of the relational reasoning in existing works. In this paper, we verify that the transformer implements complex relational reasoning, and we propose and analyze model-free and model-based offline MARL algorithms with the transformer approximators. We prove that the suboptimality gaps of the model-free and model-based algorithms are independent of and logarithmic in the number of agents respectively, which mitigates the curse of many agents. These results are consequences of a novel generalization error bound of the transformer and a novel analysis of the Maximum Likelihood Estimate (MLE) of the system dynamics with the transformer. Our model-based algorithm is the first provably efficient MARL algorithm that explicitly exploits the permutation invariance of the agents. Our improved generalization bound may be of independent interest and is applicable to other regression problems related to the transformer beyond MARL.


Distributional Reward Estimation for Effective Multi-Agent Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Multi-agent reinforcement learning has drawn increasing attention in practice, e.g., robotics and automatic driving, as it can explore optimal policies using samples generated by interacting with the environment. However, high reward uncertainty still remains a problem when we want to train a satisfactory model, because obtaining high-quality reward feedback is usually expensive and even infeasible. To handle this issue, previous methods mainly focus on passive reward correction. At the same time, recent active reward estimation methods have proven to be a recipe for reducing the effect of reward uncertainty. In this paper, we propose a novel Distributional Reward Estimation framework for effective Multi-Agent Reinforcement Learning (DRE-MARL). Our main idea is to design the multi-action-branch reward estimation and policy-weighted reward aggregation for stabilized training. Specifically, we design the multi-action-branch reward estimation to model reward distributions on all action branches. Then we utilize reward aggregation to obtain stable updating signals during training. Our intuition is that consideration of all possible consequences of actions could be useful for learning policies. The superiority of the DRE-MARL is demonstrated using benchmark multi-agent scenarios, compared with the SOTA baselines in terms of both effectiveness and robustness.


Decentralized Coverage Path Planning with Reinforcement Learning and Dual Guidance

arXiv.org Artificial Intelligence

Planning coverage path for multiple robots in a decentralized way enhances robustness to coverage tasks handling uncertain malfunctions. To achieve high efficiency in a distributed manner for each single robot, a comprehensive understanding of both the complicated environments and cooperative agents intent is crucial. Unfortunately, existing works commonly consider only part of these factors, resulting in imbalanced subareas or unnecessary overlaps. To tackle this issue, we introduce a Decentralized reinforcement learning framework with dual guidance to train each agent to solve the decentralized multiple coverage path planning problem straightly through the environment states. As distributed robots require others intentions to perform better coverage efficiency, we utilize two guidance methods, artificial potential fields and heuristic guidance, to include and integrate others intentions into observations for each robot. With our constructed framework, results have shown our agents successfully learn to determine their own subareas while achieving full coverage, balanced subareas and low overlap rates. We then implement spanning tree cover within those subareas to construct actual routes for each robot and complete given coverage tasks. Our performance is also compared with the state of the art decentralized method showing at most 10 percent lower overlap rates while performing high efficiency in similar environments.


Multi-agent Dynamic Algorithm Configuration

arXiv.org Artificial Intelligence

A popular algorithm configuration tuning paradigm is dynamic algorithm configuration (DAC), in which an agent learns dynamic configuration policies across instances by reinforcement learning (RL). However, in many complex algorithms, there may exist different types of configuration hyperparameters, and such heterogeneity may bring difficulties for classic DAC which uses a single-agent RL policy. In this paper, we aim to address this issue and propose multi-agent DAC (MA-DAC), with one agent working for one type of configuration hyperparameter. MA-DAC formulates the dynamic configuration of a complex algorithm with multiple types of hyperparameters as a contextual multi-agent Markov decision process and solves it by a cooperative multi-agent RL (MARL) algorithm. To instantiate, we apply MA-DAC to a well-known optimization algorithm for multi-objective optimization problems. Experimental results show the effectiveness of MA-DAC in not only achieving superior performance compared with other configuration tuning approaches based on heuristic rules, multi-armed bandits, and single-agent RL, but also being capable of generalizing to different problem classes. Furthermore, we release the environments in this paper as a benchmark for testing MARL algorithms, with the hope of facilitating the application of MARL.