Agent Societies
Facility Location Problem with Aleatory Agents
Auricchio, Gennaro, Zhang, Jie
In this paper, we introduce and study the Facility Location Problem with Aleatory Agents (FLPAA), where the facility accommodates n agents larger than the number of agents reporting their preferences, namely n_r. The spare capacity is used by n_u=n-n_r aleatory agents sampled from a probability distribution \mu. The goal of FLPAA is to find a location that minimizes the ex-ante social cost, which is the expected cost of the n_u agents sampled from \mu plus the cost incurred by the agents reporting their position. We investigate the mechanism design aspects of the FLPAA under the assumption that the Mechanism Designer (MD) lacks knowledge of the distribution $\mu$ but can query k quantiles of \mu. We explore the trade-off between acquiring more insights into the probability distribution and designing a better-performing mechanism, which we describe through the strong approximation ratio (SAR). The SAR of a mechanism measures the highest ratio between the cost of the mechanisms and the cost of the optimal solution on the worst-case input x and worst-case distribution \mu, offering a metric for efficiency that does not depend on \mu. We divide our study into four different information settings: the zero information case, in which the MD has access to no quantiles; the median information case, in which the MD has access to the median of \mu; the n_u-quantile information case, in which the MD has access to n_u quantiles of its choice, and the k-quantile information case, in which the MD has access to k
A Robust, Task-Agnostic and Fully-Scalable Voxel Mapping System for Large Scale Environments
La, Jinche, Kang, Jun-Gill, Lee, Dasol
Perception still remains a challenging problem for autonomous navigation in unknown environment, especially for aerial vehicles. Most mapping algorithms for autonomous navigation are specifically designed for their very intended task, which hinders extended usage or cooperative task. In this paper, we propose a voxel mapping system that can build an adaptable map for multiple tasks. The system employs hash table-based map structure and manages each voxel with spatial and temporal priorities without explicit map boundary. We also introduce an efficient map-sharing feature with minimal bandwidth to enable multi-agent applications. We tested the system in real world and simulation environment by applying it for various tasks including local mapping, global mapping, cooperative multi-agent navigation, and high-speed navigation. Our system proved its capability to build customizable map with high resolution, wide coverage, and real-time performance regardless of sensor and environment. The system can build a full-resolution map using the map-sharing feature, with over 95 % of bandwidth reduction from raw sensor data.
Linear Contextual Bandits with Interference
Xu, Yang, Lu, Wenbin, Song, Rui
Interference, a key concept in causal inference, extends the reward modeling process by accounting for the impact of one unit's actions on the rewards of others. In contextual bandit (CB) settings, where multiple units are present in the same round, potential interference can significantly affect the estimation of expected rewards for different arms, thereby influencing the decision-making process. Although some prior work has explored multi-agent and adversarial bandits in interference-aware settings, the effect of interference in CB, as well as the underlying theory, remains significantly underexplored. In this paper, we introduce a systematic framework to address interference in Linear CB (LinCB), bridging the gap between causal inference and online decision-making. We propose a series of algorithms that explicitly quantify the interference effect in the reward modeling process and provide comprehensive theoretical guarantees, including sublinear regret bounds, finite sample upper bounds, and asymptotic properties. The effectiveness of our approach is demonstrated through simulations and a synthetic data generated based on MovieLens data.
Loopy Movements: Emergence of Rotation in a Multicellular Robot
Unlike most human-engineered systems, many biological systems rely on emergent behaviors from low-level interactions, enabling greater diversity and superior adaptation to complex, dynamic environments. This study explores emergent decentralized rotation in the Loopy multicellular robot, composed of homogeneous, physically linked, 1-degree-of-freedom cells. Inspired by biological systems like sunflowers, Loopy uses simple local interactions-diffusion, reaction, and active transport of simulated chemicals, called morphogens-without centralized control or knowledge of its global morphology. Through these interactions, the robot self-organizes to achieve coordinated rotational motion and forms lobes-local protrusions created by clusters of motor cells. This study investigates how these interactions drive Loopy's rotation, the impact of its morphology, and its resilience to actuator failures. Our findings reveal two distinct behaviors: 1) inner valleys between lobes rotate faster than the outer peaks, contrasting with rigid body dynamics, and 2) cells rotate in the opposite direction of the overall morphology. The experiments show that while Loopy's morphology does not affect its angular velocity relative to its cells, larger lobes increase cellular rotation and decrease morphology rotation relative to the environment. Even with up to one-third of its actuators disabled and significant morphological changes, Loopy maintains its rotational abilities, highlighting the potential of decentralized, bio-inspired strategies for resilient and adaptable robotic systems.
On the Hardness of Decentralized Multi-Agent Policy Evaluation under Byzantine Attacks
Hairi, null, Fang, Minghong, Zhang, Zifan, Velasquez, Alvaro, Liu, Jia
In this paper, we study a fully-decentralized multi-agent policy evaluation problem, which is an important sub-problem in cooperative multi-agent reinforcement learning, in the presence of up to $f$ faulty agents. In particular, we focus on the so-called Byzantine faulty model with model poisoning setting. In general, policy evaluation is to evaluate the value function of any given policy. In cooperative multi-agent system, the system-wide rewards are usually modeled as the uniform average of rewards from all agents. We investigate the multi-agent policy evaluation problem in the presence of Byzantine agents, particularly in the setting of heterogeneous local rewards. Ideally, the goal of the agents is to evaluate the accumulated system-wide rewards, which are uniform average of rewards of the normal agents for a given policy. It means that all agents agree upon common values (the consensus part) and furthermore, the consensus values are the value functions (the convergence part). However, we prove that this goal is not achievable. Instead, we consider a relaxed version of the problem, where the goal of the agents is to evaluate accumulated system-wide reward, which is an appropriately weighted average reward of the normal agents. We further prove that there is no correct algorithm that can guarantee that the total number of positive weights exceeds $|\mathcal{N}|-f $, where $|\mathcal{N}|$ is the number of normal agents. Towards the end, we propose a Byzantine-tolerant decentralized temporal difference algorithm that can guarantee asymptotic consensus under scalar function approximation. We then empirically test the effective of the proposed algorithm.
Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning
Formanek, Claude, Beyers, Louise, Tilbury, Callum Rhys, Shock, Jonathan P., Pretorius, Arnu
Offline multi-agent reinforcement learning (MARL) is an exciting direction of research that uses static datasets to find optimal control policies for multi-agent systems. Though the field is by definition data-driven, efforts have thus far neglected data in their drive to achieve state-of-the-art results. We first substantiate this claim by surveying the literature, showing how the majority of works generate their own datasets without consistent methodology and provide sparse information about the characteristics of these datasets. We then show why neglecting the nature of the data is problematic, through salient examples of how tightly algorithmic performance is coupled to the dataset used, necessitating a common foundation for experiments in the field. In response, we take a big step towards improving data usage and data awareness in offline MARL, with three key contributions: (1) a clear guideline for generating novel datasets; (2) a standardisation of over 80 existing datasets, hosted in a publicly available repository, using a consistent storage format and easy-to-use API; and (3) a suite of analysis tools that allow us to understand these datasets better, aiding further development. These contributions are all publicly available on our website.
Foragax: An Agent-Based Modelling Framework Based on JAX
Chaturvedi, Siddharth, El-Gazzar, Ahmed, van Gerven, Marcel
Foraging for resources is a ubiquitous activity conducted by living organisms in a shared environment to maintain their homeostasis. Modelling multi-agent foraging in-silico allows us to study both individual and collective emergent behaviour in a tractable manner. Agent-based modelling has proven to be effective in simulating such tasks, though scaling the simulations to accommodate large numbers of agents with complex dynamics remains challenging. In this work, we present Foragax, a general-purpose, scalable, hardware-accelerated, multi-agent foraging toolkit. Leveraging the JAX library, our toolkit can simulate thousands of agents foraging in a common environment, in an end-to-end vectorized and differentiable manner. The toolkit provides agent-based modelling tools to model various foraging tasks, including options to design custom spatial and temporal agent dynamics, control policies, sensor models, and boundary conditions. Further, the number of agents during such simulations can be increased or decreased based on custom rules. While applied to foraging, the toolkit can also be used to model and simulate a wide range of other multi-agent scenarios.
On the limits of agency in agent-based models
Chopra, Ayush, Kumar, Shashank, Giray-Kuru, Nurullah, Raskar, Ramesh, Quera-Bofarull, Arnau
Agent-based modeling (ABM) seeks to understand the behavior of complex systems by simulating a collection of agents that act and interact within an environment. Their practical utility requires capturing realistic environment dynamics and adaptive agent behavior while efficiently simulating million-size populations. Recent advancements in large language models (LLMs) present an opportunity to enhance ABMs by using LLMs as agents with further potential to capture adaptive behavior. However, the computational infeasibility of using LLMs for large populations has hindered their widespread adoption. In this paper, we introduce AgentTorch -- a framework that scales ABMs to millions of agents while capturing high-resolution agent behavior using LLMs. We benchmark the utility of LLMs as ABM agents, exploring the trade-off between simulation scale and individual agency. Using the COVID-19 pandemic as a case study, we demonstrate how AgentTorch can simulate 8.4 million agents representing New York City, capturing the impact of isolation and employment behavior on health and economic outcomes. We compare the performance of different agent architectures based on heuristic and LLM agents in predicting disease waves and unemployment rates. Furthermore, we showcase AgentTorch's capabilities for retrospective, counterfactual, and prospective analyses, highlighting how adaptive agent behavior can help overcome the limitations of historical data in policy design. AgentTorch is an open-source project actively being used for policy-making and scientific discovery around the world. The framework is available here: github.com/AgentTorch/AgentTorch.
A Spatiotemporal Stealthy Backdoor Attack against Cooperative Multi-Agent Deep Reinforcement Learning
Yu, Yinbo, Yan, Saihao, Liu, Jiajia
Recent studies have shown that cooperative multi-agent deep reinforcement learning (c-MADRL) is under the threat of backdoor attacks. Once a backdoor trigger is observed, it will perform abnormal actions leading to failures or malicious goals. However, existing proposed backdoors suffer from several issues, e.g., fixed visual trigger patterns lack stealthiness, the backdoor is trained or activated by an additional network, or all agents are backdoored. To this end, in this paper, we propose a novel backdoor attack against c-MADRL, which attacks the entire multi-agent team by embedding the backdoor only in a single agent. Firstly, we introduce adversary spatiotemporal behavior patterns as the backdoor trigger rather than manual-injected fixed visual patterns or instant status and control the attack duration. This method can guarantee the stealthiness and practicality of injected backdoors. Secondly, we hack the original reward function of the backdoored agent via reward reverse and unilateral guidance during training to ensure its adverse influence on the entire team. We evaluate our backdoor attacks on two classic c-MADRL algorithms VDN and QMIX, in a popular c-MADRL environment SMAC. The experimental results demonstrate that our backdoor attacks are able to reach a high attack success rate (91.6\%) while maintaining a low clean performance variance rate (3.7\%).
Autonomous Vehicle Controllers From End-to-End Differentiable Simulation
Nachkov, Asen, Paudel, Danda Pani, Van Gool, Luc
Current methods to learn controllers for autonomous vehicles (AVs) focus on behavioural cloning. Being trained only on exact historic data, the resulting agents often generalize poorly to novel scenarios. Simulators provide the opportunity to go beyond offline datasets, but they are still treated as complicated black boxes, only used to update the global simulation state. As a result, these RL algorithms are slow, sample-inefficient, and prior-agnostic. In this work, we leverage a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers on the large-scale Waymo Open Motion Dataset. Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of the environment dynamics serve as a useful prior to help the agent learn a more grounded policy. We combine this setup with a recurrent architecture that can efficiently propagate temporal information across long simulated trajectories. This APG method allows us to learn robust, accurate, and fast policies, while only requiring widely-available expert trajectories, instead of scarce expert actions. We compare to behavioural cloning and find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.