Agent Societies
Analysing the Sample Complexity of Opponent Shaping
Fung, Kitty, Zhang, Qizhen, Lu, Chris, Wan, Jia, Willi, Timon, Foerster, Jakob
Learning in general-sum games often yields collectively sub-optimal results. Addressing this, opponent shaping (OS) methods actively guide the learning processes of other agents, empirically leading to improved individual and group performances in many settings. Early OS methods use higher-order derivatives to shape the learning of co-players, making them unsuitable for shaping multiple learning steps. Follow-up work, Model-free Opponent Shaping (M-FOS), addresses these by reframing the OS problem as a meta-game. In contrast to early OS methods, there is little theoretical understanding of the M-FOS framework. Providing theoretical guarantees for M-FOS is hard because A) there is little literature on theoretical sample complexity bounds for meta-reinforcement learning B) M-FOS operates in continuous state and action spaces, so theoretical analysis is challenging. In this work, we present R-FOS, a tabular version of M-FOS that is more suitable for theoretical analysis. R-FOS discretises the continuous meta-game MDP into a tabular MDP. Within this discretised MDP, we adapt the $R_{max}$ algorithm, most prominently used to derive PAC-bounds for MDPs, as the meta-learner in the R-FOS algorithm. We derive a sample complexity bound that is exponential in the cardinality of the inner state and action space and the number of agents. Our bound guarantees that, with high probability, the final policy learned by an R-FOS agent is close to the optimal policy, apart from a constant factor. Finally, we investigate how R-FOS's sample complexity scales in the size of state-action space. Our theoretical results on scaling are supported empirically in the Matching Pennies environment.
Optimizing Delegation in Collaborative Human-AI Hybrid Teams
Fuchs, Andrew, Passarella, Andrea, Conti, Marco
When humans and autonomous systems operate together as what we refer to as a hybrid team, we of course wish to ensure the team operates successfully and effectively. We refer to team members as agents. In our proposed framework, we address the case of hybrid teams in which, at any time, only one team member (the control agent) is authorized to act as control for the team. To determine the best selection of a control agent, we propose the addition of an AI manager (via Reinforcement Learning) which learns as an outside observer of the team. The manager learns a model of behavior linking observations of agent performance and the environment/world the team is operating in, and from these observations makes the most desirable selection of a control agent. We restrict the manager task by introducing a set of constraints. The manager constraints indicate acceptable team operation, so a violation occurs if the team enters a condition which is unacceptable and requires manager intervention. To ensure minimal added complexity or potential inefficiency for the team, the manager should attempt to minimize the number of times the team reaches a constraint violation and requires subsequent manager intervention. Therefore our manager is optimizing its selection of authorized agents to boost overall team performance while minimizing the frequency of manager intervention. We demonstrate our manager performance in a simulated driving scenario representing the case of a hybrid team of agents composed of a human driver and autonomous driving system. We perform experiments for our driving scenario with interfering vehicles, indicating the need for collision avoidance and proper speed control. Our results indicate a positive impact of our manager, with some cases resulting in increased team performance up to ~187% that of the best solo agent performance.
Learning Collective Behaviors from Observation
We present a comprehensive examination of learning methodologies employed for the structural identification of dynamical systems. These techniques are designed to elucidate emergent phenomena within intricate systems of interacting agents. Our approach not only ensures theoretical convergence guarantees but also exhibits computational efficiency when handling high-dimensional observational data. The methods adeptly reconstruct both first- and second-order dynamical systems, accommodating observation and stochastic noise, intricate interaction rules, absent interaction features, and real-world observations in agent systems. The foundational aspect of our learning methodologies resides in the formulation of tailored loss functions using the variational inverse problem approach, inherently equipping our methods with dimension reduction capabilities.
Investigating Driving Interactions: A Robust Multi-Agent Simulation Framework for Autonomous Vehicles
Kaufeld, Marc, Trauth, Rainer, Betz, Johannes
Current validation methods often rely on recorded data and basic functional checks, which may not be sufficient to encompass the scenarios an autonomous vehicle might encounter. In addition, there is a growing need for complex scenarios with changing vehicle interactions for comprehensive validation. This work introduces a novel synchronous multi-agent simulation framework for autonomous vehicles in interactive scenarios. Our approach creates an interactive scenario and incorporates publicly available edge-case scenarios wherein simulated vehicles are replaced by agents navigating to predefined destinations. We provide a platform that enables the integration of different autonomous driving planning methodologies and includes a set of evaluation metrics to assess autonomous driving behavior. Our study explores different planning setups and adjusts simulation complexity to test the framework's adaptability and performance. Results highlight the critical role of simulating vehicle interactions to enhance autonomous driving systems. Our setup offers unique insights for developing advanced algorithms for complex driving tasks to accelerate future investigations and developments in this field. The multi-agent simulation framework is available as open-source software: https://github.com/TUM-AVS/Frenetix-Motion-Planner
Carthago Delenda Est: Co-opetitive Indirect Information Diffusion Model for Influence Operations on Online Social Media
Low, Jwen Fai, Fung, Benjamin C. M., Iqbal, Farkhund, Fachkha, Claude
Planning and/or defending against decentralized info ops can be aided by computational simulations in lieu of ethically-fraught live experiments on social media. In this study, we introduce Diluvsion, an agent-based model for contested information propagation efforts on Twitter-like social media. The model emphasizes a user's belief in an opinion (stance) being impacted by the perception of potentially illusory popular support from constant incoming floods of indirect information, floods that can be cooperatively engineered in an uncoordinated manner by bots as they compete to spread their stances. Our model, which has been validated against real-world data, is an advancement over previous models because we account for engagement metrics in influencing stance adoption, non-social tie spreading of information, neutrality as a stance that can be spread, and themes that are analogous to media's framing effect and are symbiotic with respect to stance propagation. The strengths of the Diluvsion model are demonstrated in simulations of orthodox info ops, e.g., maximizing adoption of one stance; creating echo chambers; inducing polarization; and unorthodox info ops, e.g., simultaneous support of multiple stances as a Trojan horse tactic for the dissemination of a theme.
Joint Intrinsic Motivation for Coordinated Exploration in Multi-Agent Deep Reinforcement Learning
Toquebiau, Maxime, Bredeche, Nicolas, Benamar, Faïz, Jun, Jae-Yun
Multi-agent deep reinforcement learning (MADRL) problems often encounter the challenge of sparse rewards. This challenge becomes even more pronounced when coordination among agents is necessary. As performance depends not only on one agent's behavior but rather on the joint behavior of multiple agents, finding an adequate solution becomes significantly harder. In this context, a group of agents can benefit from actively exploring different joint strategies in order to determine the most efficient one. In this paper, we propose an approach for rewarding strategies where agents collectively exhibit novel behaviors. We present JIM (Joint Intrinsic Motivation), a multi-agent intrinsic motivation method that follows the centralized learning with decentralized execution paradigm. JIM rewards joint trajectories based on a centralized measure of novelty designed to function in continuous environments. We demonstrate the strengths of this approach both in a synthetic environment designed to reveal shortcomings of state-of-the-art MADRL methods, and in simulated robotic tasks. Results show that joint exploration is crucial for solving tasks where the optimal strategy requires a high level of coordination.
Approximating the Core via Iterative Coalition Sampling
Gemp, Ian, Lanctot, Marc, Marris, Luke, Mao, Yiran, Duéñez-Guzmán, Edgar, Perrin, Sarah, Gyorgy, Andras, Elie, Romuald, Piliouras, Georgios, Kaisers, Michael, Hennes, Daniel, Bullard, Kalesha, Larson, Kate, Bachrach, Yoram
The core is a central solution concept in cooperative game theory, defined as the set of feasible allocations or payments such that no subset of agents has incentive to break away and form their own subgroup or coalition. However, it has long been known that the core (and approximations, such as the least-core) are hard to compute. This limits our ability to analyze cooperative games in general, and to fully embrace cooperative game theory contributions in domains such as explainable AI (XAI), where the core can complement the Shapley values to identify influential features or instances supporting predictions by black-box models. We propose novel iterative algorithms for computing variants of the core, which avoid the computational bottleneck of many other approaches; namely solving large linear programs. As such, they scale better to very large problems as we demonstrate across different classes of cooperative games, including weighted voting games, induced subgraph games, and marginal contribution networks. We also explore our algorithms in the context of XAI, providing further evidence of the power of the core for such applications.
Fairness and Privacy Guarantees in Federated Contextual Bandits
Solanki, Sambhav, Jain, Shweta, Gujar, Sujit
This paper considers the contextual multi-armed bandit (CMAB) problem with fairness and privacy guarantees in a federated environment. We consider merit-based exposure as the desired fair outcome, which provides exposure to each action in proportion to the reward associated. We model the algorithm's effectiveness using fairness regret, which captures the difference between fair optimal policy and the policy output by the algorithm. Applying fair CMAB algorithm to each agent individually leads to fairness regret linear in the number of agents. We propose that collaborative -- federated learning can be more effective and provide the algorithm Fed-FairX-LinUCB that also ensures differential privacy. The primary challenge in extending the existing privacy framework is designing the communication protocol for communicating required information across agents. A naive protocol can either lead to weaker privacy guarantees or higher regret. We design a novel communication protocol that allows for (i) Sub-linear theoretical bounds on fairness regret for Fed-FairX-LinUCB and comparable bounds for the private counterpart, Priv-FairX-LinUCB (relative to single-agent learning), (ii) Effective use of privacy budget in Priv-FairX-LinUCB. We demonstrate the efficacy of our proposed algorithm with extensive simulations-based experiments. We show that both Fed-FairX-LinUCB and Priv-FairX-LinUCB achieve near-optimal fairness regret.
Agent-Specific Effects: A Causal Effect Propagation Analysis in Multi-Agent MDPs
Triantafyllou, Stelios, Sukovic, Aleksa, Mandal, Debmalya, Radanovic, Goran
Establishing causal relationships between actions and outcomes is fundamental for accountable multi-agent decision-making. However, interpreting and quantifying agents' contributions to such relationships pose significant challenges. These challenges are particularly prominent in the context of multi-agent sequential decision-making, where the causal effect of an agent's action on the outcome depends on how other agents respond to that action. In this paper, our objective is to present a systematic approach for attributing the causal effects of agents' actions to the influence they exert on other agents. Focusing on multi-agent Markov decision processes, we introduce agent-specific effects (ASE), a novel causal quantity that measures the effect of an agent's action on the outcome that propagates through other agents. We then turn to the counterfactual counterpart of ASE (cf-ASE), provide a sufficient set of conditions for identifying cf-ASE, and propose a practical sampling-based algorithm for estimating it. Finally, we experimentally evaluate the utility of cf-ASE through a simulation-based testbed, which includes a sepsis management environment.
More Agents Is All You Need
Li, Junyou, Zhang, Qin, Yu, Yangbin, Fu, Qiang, Ye, Deheng
We find that, simply via a sampling-and-voting method, the performance of large language models (LLMs) scales with the number of agents instantiated. Also, this method is orthogonal to existing complicated methods to further enhance LLMs, while the degree of enhancement is correlated to the task difficulty. We conduct comprehensive experiments on a wide range of LLM benchmarks to verify the presence of our finding, and to study the properties that can facilitate its occurrence. Our code is publicly available at: Git.