Agent Societies
Learning Safe Control for Multi-Robot Systems: Methods, Verification, and Open Challenges
Garg, Kunal, Zhang, Songyuan, So, Oswin, Dawson, Charles, Fan, Chuchu
In this survey, we review the recent advances in control design methods for robotic multi-agent systems (MAS), focussing on learning-based methods with safety considerations. We start by reviewing various notions of safety and liveness properties, and modeling frameworks used for problem formulation of MAS. Then we provide a comprehensive review of learning-based methods for safe control design for multi-robot systems. We start with various types of shielding-based methods, such as safety certificates, predictive filters, and reachability tools. Then, we review the current state of control barrier certificate learning in both a centralized and distributed manner, followed by a comprehensive review of multi-agent reinforcement learning with a particular focus on safety. Next, we discuss the state-of-the-art verification tools for the correctness of learning-based methods. Based on the capabilities and the limitations of the state of the art methods in learning and verification for MAS, we identify various broad themes for open challenges: how to design methods that can achieve good performance along with safety guarantees; how to decompose single-agent based centralized methods for MAS; how to account for communication-related practical issues; and how to assess transfer of theoretical guarantees to practice.
Approximate Linear Programming and Decentralized Policy Improvement in Cooperative Multi-agent Markov Decision Processes
Mandal, Lakshmi, Lakshminarayanan, Chandrashekar, Bhatnagar, Shalabh
In this work, we consider a `cooperative' multi-agent Markov decision process (MDP) involving m greater than 1 agents, where all agents are aware of the system model. At each decision epoch, all the m agents cooperatively select actions in order to maximize a common long-term objective. Since the number of actions grows exponentially in the number of agents, policy improvement is computationally expensive. Recent works have proposed using decentralized policy improvement in which each agent assumes that the decisions of the other agents are fixed and it improves its decisions unilaterally. Yet, in these works, exact values are computed. In our work, for cooperative multi-agent finite and infinite horizon discounted MDPs, we propose suitable approximate policy iteration algorithms, wherein we use approximate linear programming to compute the approximate value function and use decentralized policy improvement. Thus our algorithms can handle both large number of states as well as multiple agents. We provide theoretical guarantees for our algorithms and also demonstrate the performance of our algorithms on some numerical examples.
ADAPTER-RL: Adaptation of Any Agent using Reinforcement Learning
Jin, Yizhao, Slabaugh, Greg, Lucas, Simon
Lastly, in scenarios where multiple agents are present, the behavioral mixture of agents approach, for example Vinyals et al. (2019) samples the final agent from the Nash distribution of the set of agents, can be utilized. Given that different agents, or experts, may recommend varying actions for an identical state, this results in an intrinsic stochastic policy, taking advantage of the diversity in agent decisions. If the state space is continuous, a common approach is to transform the actions into a normal or beta distribution. We apply one-hot encoding with temperature-scaled softmax. A discrete action space can be represented as a one-hot encoded vector, For instance, if action 2 out of 5 is chosen, its one-hot representation is [0, 1, 0, 0, 0], the scale the one-hot vector to [0, 1/τ, 0, 0, 0]. The higher the temperature coefficient τ, the more spread out the distribution becomes, while a lower temperature coefficient nudges the distribution closer to a deterministic action.
Many learning agents interacting with an agent-based market model
Dicks, Matthew, Paskaramoorthy, Andrew, Gebbie, Tim
We consider the dynamics and the interactions of multiple reinforcement learning optimal execution trading agents interacting with a reactive Agent-Based Model (ABM) of a financial market in event time. The model represents a market ecology with 3-trophic levels represented by: optimal execution learning agents, minimally intelligent liquidity takers, and fast electronic liquidity providers. The optimal execution agent classes include buying and selling agents that can either use a combination of limit orders and market orders, or only trade using market orders. The reward function explicitly balances trade execution slippage against the penalty of not executing the order timeously. This work demonstrates how multiple competing learning agents impact a minimally intelligent market simulation as functions of the number of agents, the size of agents' initial orders, and the state spaces used for learning. We use phase space plots to examine the dynamics of the ABM, when various specifications of learning agents are included. Further, we examine whether the inclusion of optimal execution agents that can learn is able to produce dynamics with the same complexity as empirical data. We find that the inclusion of optimal execution agents changes the stylised facts produced by ABM to conform more with empirical data, and are a necessary inclusion for ABMs investigating market micro-structure. However, including execution agents to chartist-fundamentalist-noise ABMs is insufficient to recover the complexity observed in empirical data.
Evaluating LLM Agent Group Dynamics against Human Group Dynamics: A Case Study on Wisdom of Partisan Crowds
Chuang, Yun-Shiuan, Suresh, Siddharth, Harlalka, Nikunj, Goyal, Agam, Hawkins, Robert, Yang, Sijia, Shah, Dhavan, Hu, Junjie, Rogers, Timothy T.
This study investigates the potential of Large Language Models (LLMs) to simulate human group dynamics, particularly within politically charged contexts. We replicate the Wisdom of Partisan Crowds phenomenon using LLMs to role-play as Democrat and Republican personas, engaging in a structured interaction akin to human group study. Our approach evaluates how agents' responses evolve through social influence. Our key findings indicate that LLM agents role-playing detailed personas and without Chain-of-Thought (CoT) reasoning closely align with human behaviors, while having CoT reasoning hurts the alignment. However, incorporating explicit biases into agent prompts does not necessarily enhance the wisdom of partisan crowds. Moreover, fine-tuning LLMs with human data shows promise in achieving human-like behavior but poses a risk of overfitting certain behaviors. These findings show the potential and limitations of using LLM agents in modeling human group phenomena.
Scenario Diffusion: Controllable Driving Scenario Generation With Diffusion
Pronovost, Ethan, Ganesina, Meghana Reddy, Hendy, Noureldin, Wang, Zeyu, Morales, Andres, Wang, Kai, Roy, Nicholas
Automated creation of synthetic traffic scenarios is a key part of validating the safety of autonomous vehicles (AVs). In this paper, we propose Scenario Diffusion, a novel diffusion-based architecture for generating traffic scenarios that enables controllable scenario generation. We combine latent diffusion, object detection and trajectory regression to generate distributions of synthetic agent poses, orientations and trajectories simultaneously. To provide additional control over the generated scenario, this distribution is conditioned on a map and sets of tokens describing the desired scenario. We show that our approach has sufficient expressive capacity to model diverse traffic patterns and generalizes to different geographical regions.
Data-Driven Structured Policy Iteration for Homogeneous Distributed Systems
Alemzadeh, Siavash, Talebi, Shahriar, Mesbahi, Mehran
Control of networked systems, comprised of interacting agents, is often achieved through modeling the underlying interactions. Constructing accurate models of such interactions--in the meantime--can become prohibitive in applications. Data-driven control methods avoid such complications by directly synthesizing a controller from the observed data. In this paper, we propose an algorithm referred to as Data-driven Structured Policy Iteration (D2SPI), for synthesizing an efficient feedback mechanism that respects the sparsity pattern induced by the underlying interaction network. In particular, our algorithm uses temporary "auxiliary" communication links in order to enable the required information exchange on a (smaller) sub-network during the "learning phase" -- links that will be removed subsequently for the final distributed feedback synthesis. We then proceed to show that the learned policy results in a stabilizing structured policy for the entire network. Our analysis is then followed by showing the stability and convergence of the proposed distributed policies throughout the learning phase, exploiting a construct referred to as the "Patterned monoid.'' The performance of D2SPI is then demonstrated using representative simulation scenarios.
Simplicial Models for the Epistemic Logic of Faulty Agents
Goubault, Eric, Kniazev, Roman, Ledent, Jeremy, Rajsbaum, Sergio
In recent years, several authors have been investigating simplicial models, a model of epistemic logic based on higher-dimensional structures called simplicial complexes. In the original formulation, simplicial models were always assumed to be pure, meaning that all worlds have the same dimension. This is equivalent to the standard S5n semantics of epistemic logic, based on Kripke models. By removing the assumption that models must be pure, we can go beyond the usual Kripke semantics and study epistemic logics where the number of agents participating in a world can vary. This approach has been developed in a number of papers, with applications in fault-tolerant distributed computing where processes may crash during the execution of a system. A difficulty that arises is that subtle design choices in the definition of impure simplicial models can result in different axioms of the resulting logic. In this paper, we classify those design choices systematically, and axiomatize the corresponding logics. We illustrate them via distributed computing examples of synchronous systems where processes may crash.
ChoiceMates: Supporting Unfamiliar Online Decision-Making with Multi-Agent Conversational Interactions
Park, Jeongeon, Min, Bryan, Ma, Xiaojuan, Kim, Juho
Unfamiliar decisions -- decisions where people lack adequate domain knowledge or expertise -- specifically increase the complexity and uncertainty of the process of searching for, understanding, and making decisions with online information. Through our formative study (n=14), we observed users' challenges in accessing diverse perspectives, identifying relevant information, and deciding the right moment to make the final decision. We present ChoiceMates, a system that enables conversations with a dynamic set of LLM-powered agents for a holistic domain understanding and efficient discovery and management of information to make decisions. Agents, as opinionated personas, flexibly join the conversation, not only providing responses but also conversing among themselves to elicit each agent's preferences. Our between-subjects study (n=36) comparing ChoiceMates to conventional web search and single-agent showed that ChoiceMates was more helpful in discovering, diving deeper, and managing information compared to Web with higher confidence. We also describe how participants utilized multi-agent conversations in their decision-making process.
A simple learning agent interacting with an agent-based market model
Dicks, Matthew, Paskaramoorthy, Andrew, Gebbie, Tim
We consider the learning dynamics of a single reinforcement learning optimal execution trading agent when it interacts with an event driven agent-based financial market model. Trading takes place asynchronously through a matching engine in event time. The optimal execution agent is considered at different levels of initial order-sizes and differently sized state spaces. The resulting impact on the agent-based model and market are considered using a calibration approach that explores changes in the empirical stylised facts and price impact curves. Convergence, volume trajectory and action trace plots are used to visualise the learning dynamics. Here the smaller state space agents had the number of states they visited converge much faster than the larger state space agents, and they were able to start learning to trade intuitively using the spread and volume states. We find that the moments of the model are robust to the impact of the learning agents except for the Hurst exponent, which was lowered by the introduction of strategic order-splitting. The introduction of the learning agent preserves the shape of the price impact curves but can reduce the trade-sign auto-correlations when their trading volumes increase.