Agent Societies
Deep Generalized Schr\"odinger Bridge
Liu, Guan-Horng, Chen, Tianrong, So, Oswin, Theodorou, Evangelos A.
Mean-Field Game (MFG) serves as a crucial mathematical framework in modeling the collective behavior of individual agents interacting stochastically with a large population. In this work, we aim at solving a challenging class of MFGs in which the differentiability of these interacting preferences may not be available to the solver, and the population is urged to converge exactly to some desired distribution. These setups are, despite being well-motivated for practical purposes, complicated enough to paralyze most (deep) numerical solvers. Nevertheless, we show that Schr\"odinger Bridge - as an entropy-regularized optimal transport model - can be generalized to accepting mean-field structures, hence solving these MFGs. This is achieved via the application of Forward-Backward Stochastic Differential Equations theory, which, intriguingly, leads to a computational framework with a similar structure to Temporal Difference learning. As such, it opens up novel algorithmic connections to Deep Reinforcement Learning that we leverage to facilitate practical training. We show that our proposed objective function provides necessary and sufficient conditions to the mean-field problem. Our method, named Deep Generalized Schr\"odinger Bridge (DeepGSB), not only outperforms prior methods in solving classical population navigation MFGs, but is also capable of solving 1000-dimensional opinion depolarization, setting a new state-of-the-art numerical solver for high-dimensional MFGs. Our code will be made available at https://github.com/ghliu/DeepGSB.
Synthesis of Cost-Optimal Multi-Agent Systems for Resource Allocation
Multi-agent systems for resource allocation (MRAs) have been introduced as a concept for modelling competitive resource allocation problems in distributed computing. An MRA is composed of a set of agents and a set of resources. Each agent has goals in terms of allocating certain resources. For MRAs it is typically of importance that they are designed in a way such that there exists a strategy that guarantees that all agents will achieve their goals. The corresponding model checking problem is to determine whether such a winning strategy exists or not, and the synthesis problem is to actually build the strategy. While winning strategies ensure that all goals will be achieved, following such strategies does not necessarily involve an optimal use of resources. In this paper, we present a technique that allows to synthesise cost-optimal solutions to distributed resource allocation problems. We consider a scenario where system components such as agents and resources involve costs. A multi-agent system shall be designed that is cost-minimal but still capable of accomplishing a given set of goals. Our approach synthesises a winning strategy that minimises the cumulative costs of the components that are required for achieving the goals. The technique is based on a propositional logic encoding and a reduction of the synthesis problem to the maximum satisfiability problem (Max-SAT). Hence, a Max-SAT solver can be used to perform the synthesis. From a truth assignment that maximises the number of satisfied clauses of the encoding a cost-optimal winning strategy as well as a cost-optimal system can be immediately derived.
Rethinking Individual Global Max in Cooperative Multi-Agent Reinforcement Learning
Hong, Yitian, Jin, Yaochu, Tang, Yang
In cooperative multi-agent reinforcement learning, centralized training and decentralized execution (CTDE) has achieved remarkable success. Individual Global Max (IGM) decomposition, which is an important element of CTDE, measures the consistency between local and joint policies. The majority of IGM-based research focuses on how to establish this consistent relationship, but little attention has been paid to examining IGM's potential flaws. In this work, we reveal that the IGM condition is a lossy decomposition, and the error of lossy decomposition will accumulated in hypernetwork-based methods. To address the above issue, we propose to adopt an imitation learning strategy to separate the lossy decomposition from Bellman iterations, thereby avoiding error accumulation. The proposed strategy is theoretically proved and empirically verified on the StarCraft Multi-Agent Challenge benchmark problem with zero sight view. The results also confirm that the proposed method outperforms state-of-the-art IGM-based approaches.
Valid Utility Games with Information Sharing Constraints
Grimsman, David, Brown, Philip N., Marden, Jason R.
The use of game theoretic methods for control in multiagent systems has been an important topic in recent research. Valid utility games in particular have been used to model real-world problems; such games have the convenient property that the value of any decision set which is a Nash equilibrium of the game is guaranteed to be within 1/2 of the value of the optimal decision set. However, an implicit assumption in this guarantee is that each agent is aware of the decisions of all other agents. In this work, we first describe how this guarantee degrades as agents are only aware of a subset of the decisions of other agents. We then show that this loss can be mitigated by restriction to a relevant subclass of games.
Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL)
Mondal, Washim Uddin, Aggarwal, Vaneet, Ukkusuri, Satish V.
Mean-Field Control (MFC) has recently been proven to be a scalable tool to approximately solve large-scale multi-agent reinforcement learning (MARL) problems. However, these studies are typically limited to unconstrained cumulative reward maximization framework. In this paper, we show that one can use the MFC approach to approximate the MARL problem even in the presence of constraints. Specifically, we prove that, an $N$-agent constrained MARL problem, with state, and action spaces of each individual agents being of sizes $|\mathcal{X}|$, and $|\mathcal{U}|$ respectively, can be approximated by an associated constrained MFC problem with an error, $e\triangleq \mathcal{O}\left([\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}]/\sqrt{N}\right)$. In a special case where the reward, cost, and state transition functions are independent of the action distribution of the population, we prove that the error can be improved to $e=\mathcal{O}(\sqrt{|\mathcal{X}|}/\sqrt{N})$. Also, we provide a Natural Policy Gradient based algorithm and prove that it can solve the constrained MARL problem within an error of $\mathcal{O}(e)$ with a sample complexity of $\mathcal{O}(e^{-6})$.
An ensemble Multi-Agent System for non-linear classification
Fourez, Thibault, Verstaevel, Nicolas, Migeon, Frédéric, Schettini, Frédéric, Amblard, Frederic
Because of this non-linearity, their resolution requires more complex models often called "black boxes" because of their low explicability. In our research project, we aim to design a method to predict mobility information such as users' transport mode in real time from heterogeneous data (e.g., mobile phone data, smartphone sensors, etc.). This method must adapt quickly in a dynamic system where new transport modes and perturbations (e.g., changes in speed limits, COVID-19, etc.) may appear. Bringing up ever larger data streams requires the adoption of online learning techniques in which the model is updated with each new labeled point. Machine learning on dynamic systems (i.e., in which the behavior of individuals, the available sensors and the classes can evolve continuously) is one of the main motivations behind the design of Multi-Agent Systems (MAS). Recent approaches propose to transform a machine learning problem into a problem of cooperation between agents in order to reduce its complexity and to allow the system to adapt to the evolutions of the individuals (Capera et al., 2003). In this paper, we propose to use this collaborative approach to design an algorithm capable of solving supervised classification problems, some of which are non-linear, using linear classification models embedded in a multi-agent structure.
Collective Adaptation in Multi-Agent Systems: How Predator Confusion Shapes Swarm-Like Behaviors
Ivanov, Georgi, Palamas, George
Popular hypotheses about the origins of collective adaptation are related to two basic behaviours: protection from predators and a combined search for food resources. Among the anti-predator explanations, the predator confusion hypothesis suggests that groups of individuals moving in a swarm aim to overwhelm the predator while the dilution of risk hypothesis suggests that the probability of a single prey being targeted by a predator is lower in larger groups. In this paper, we explore how emergent behaviors arise from a predator-driven process as an adaptive response to external stimuli perceived as threatening. Moreover, we suggest a predator confusion process to provide a selective pressure for the prey to evolve group formations. We analyze the foraging and prey-predator dynamics evolved in terms of group density and formation, behavior consistency, predator evasion and success rate, and foraging rate. Two agents' perceptual models are compared. A local observation model, where agents can only see what's in their immediate vicinity, and a global observation model, where agents are able to see the predator at all times. Both models were evolved for predator avoidance, foraging and collision avoidance, using reinforcement learning in a simulated game environment. Our results suggest that the dilution of risk factor is sufficient to evolve group formations, and the predator confusion effect could play an important role in the evolution of collaborative behaviors. Finally, we show how variations in the information exchange of this social order can impact the global collective behaviors.
Social-PatteRNN: Socially-Aware Trajectory Prediction Guided by Motion Patterns
As robots across domains start collaborating with humans in shared environments, algorithms that enable them to reason over human intent are important to achieve safe interplay. In our work, we study human intent through the problem of predicting trajectories in dynamic environments. We explore domains where navigation guidelines are relatively strictly defined but not clearly marked in their physical environments. We hypothesize that within these domains, agents tend to exhibit short-term motion patterns that reveal context information related to the agent's general direction, intermediate goals and rules of motion, e.g., social behavior. From this intuition, we propose Social-PatteRNN, an algorithm for recurrent, multi-modal trajectory prediction that exploits motion patterns to encode the aforesaid contexts. Our approach guides long-term trajectory prediction by learning to predict short-term motion patterns. It then extracts sub-goal information from the patterns and aggregates it as social context. We assess our approach across three domains: humans crowds, humans in sports and manned aircraft in terminal airspace, achieving state-of-the-art performance.
Graphon Mean-Field Control for Cooperative Multi-Agent Reinforcement Learning
Hu, Yuanquan, Wei, Xiaoli, Yan, Junji, Zhang, Hengxi
Multi-agent reinforcement learning (MARL) has found various applications in the field of transportation and simulating [50, 1], stock price analyzing and trading [32, 31], wireless communication networks [12, 11, 13], and learning behaviors in social dilemmas [33, 28, 34]. MARL, however, becomes intractable due to the complex interactions among agents as the number of agents increases. A recent tractable approach is a mean-field approach by considering MARL in the regime with a large number of homogeneous agents under weak interactions [20]. According to the number of agents and learning goals, there are three subtle types of mean-field theories for MARL. The first one is called mean-field MARL (MF-MARL), which refers to the empirical average of the states or actions of a finite population. For example, [52] proposes to approximate interactions within the population of agents by averaging the actions of the overall population or neighboring agents.
Prismal view of ethics
Isufi, Sarah, Poje, Kristijan, Vukobratovic, Igor, Brcic, Mario
We shall have a hard look at ethics and try to extract insights in the form of abstract properties that might become tools. We want to connect ethics to games, talk about the performance of ethics, introduce curiosity into the interplay between competing and coordinating in well-performing ethics, and offer a view of possible developments that could unify increasing aggregates of entities. All this is under a long shadow cast by computational complexity that is quite negative about games. This analysis is the first step toward finding modeling aspects that might be used in AI ethics for integrating modern AI systems into human society.