Agent Societies
Robust Coordination of Linear Threshold Dynamics on Directed Weighted Networks
Arditti, Laura, Como, Giacomo, Fagnani, Fabio, Vanelli, Martina
We study asynchronous dynamics in a network of interacting agents updating their binary states according to a time-varying threshold rule. Specifically, agents revise their state asynchronously by comparing the weighted average of the current states of their neighbors in the interaction network with possibly heterogeneous time-varying threshold values. Such thresholds are determined by an exogenous signal representing an external influence field modeling the different agents' biases towards one state with respect to the other one. We prove necessary and sufficient conditions for global stability of consensus equilibria, i.e., equilibria where all agents have the same state, robustly with respect to the (constant or time-varying) external field. Our results apply to general weighted directed interaction networks and build on super-modularity properties of certain network coordination games whose best response dynamics coincide with the linear threshold dynamics. In particular, we introduce a novel notion of robust improvement paths for such games and characterize conditions for their existence.
PAC: Assisted Value Factorisation with Counterfactual Predictions in Multi-Agent Reinforcement Learning
Zhou, Hanhan, Lan, Tian, Aggarwal, Vaneet
Multi-agent reinforcement learning (MARL) has witnessed significant progress with the development of value function factorization methods. It allows optimizing a joint action-value function through the maximization of factorized per-agent utilities due to monotonicity. In this paper, we show that in partially observable MARL problems, an agent's ordering over its own actions could impose concurrent constraints (across different states) on the representable function class, causing significant estimation error during training. We tackle this limitation and propose PAC, a new framework leveraging Assistive information generated from Counterfactual Predictions of optimal joint action selection, which enable explicit assistance to value function factorization through a novel counterfactual loss. A variational inference-based information encoding method is developed to collect and encode the counterfactual predictions from an estimated baseline. To enable decentralized execution, we also derive factorized per-agent policies inspired by a maximum-entropy MARL framework. We evaluate the proposed PAC on multi-agent predator-prey and a set of StarCraft II micromanagement tasks. Empirical results demonstrate improved results of PAC over state-of-the-art value-based and policy-based multi-agent reinforcement learning algorithms on all benchmarks.
Effect of Swarm Density on Collective Tracking Performance
Kwa, Hian Lee, Philippot, Julien, Bouffanais, Roland
How does the size of a swarm affect its collective action? Despite being arguably a key parameter, no systematic and satisfactory guiding principles exist to select the number of units required for a given task and environment. Even when limited by practical considerations, system designers should endeavor to identify what a reasonable swarm size should be. Here, we show that this fundamental question is closely linked to that of selecting an appropriate swarm density. Our analysis of the influence of density on the collective performance of a target tracking task reveals different `phases' corresponding to markedly distinct group dynamics. We identify a `transition' phase, in which a complex emergent collective response arises. Interestingly, the collective dynamics within this transition phase exhibit a clear trade-off between exploratory actions and exploitative ones. We show that at any density, the exploration-exploitation balance can be adjusted to maximize the system's performance through various means, such as by changing the level of connectivity between agents. While the density is the primary factor to be considered, it should not be the sole one to be accounted for when sizing the system. Due to the inherent finite-size effects present in physical systems, we establish that the number of constituents primarily affects system-level properties such as exploitation in the transition phase. These results illustrate that instead of learning and optimizing a swarm's behavior for a specific set of task parameters, further work should instead concentrate on learning to be adaptive, thereby endowing the swarm with the highly desirable feature of being able to operate effectively over a wide range of circumstances.
Verse: A Python library for reasoning about multi-agent hybrid system scenarios
Li, Yangge, Zhu, Haoqing, Braught, Katherine, Shen, Keyi, Mitra, Sayan
We present the Verse library with the aim of making hybrid system verification more usable for multi-agent scenarios. In Verse, decision making agents move in a map and interact with each other through sensors. The decision logic for each agent is written in a subset of Python and the continuous dynamics is given by a black-box simulator. Multiple agents can be instantiated and they can be ported to different maps for creating scenarios. Verse provides functions for simulating and verifying such scenarios using existing reachability analysis algorithms. We illustrate several capabilities and use cases of the library with heterogeneous agents, incremental verification, different sensor models, and the flexibility of plugging in different subroutines for post computations.
A Survey on Distributed Online Optimization and Game
Li, Xiuxian, Xie, Lihua, Li, Na
Distributed online optimization and game have been increasingly researched in the last decade, mostly motivated by its wide applications in sensor networks, robotics (e.g., distributed target tracking and formation control), smart grids, deep learning, and so forth. In these problems, there is a network of agents who may be cooperative (i.e., distributed online optimization) or noncooperative (i.e., online game) through local information exchanges. And the local cost function of each agent is often time-varying in dynamic and even adversarial environments. At each time, a decision must be made by each agent based on historical information at hand without knowing future information on cost functions. For these problems, a comprehensive survey is still lacking. This paper aims to provide a thorough overview of distributed online optimization and game from the perspective of problem settings, communication, computation, algorithms, and performances. In addition, some potential future directions are also discussed.
Favoring Eagerness for Remaining Items: Designing Efficient, Fair, and Strategyproof Mechanisms
Guo, Xiaoxi | Sikdar, Sujoy | Xia, Lirong | Cao, Yongzhi (a:1:{s:5:"en_US";s:17:"Peking University";}) | Wang, Hanpin
In the assignment problem, the goal is to assign indivisible items to agents who have ordinal preferences, efficiently and fairly, in a strategyproof manner. In practice, first-choice maximality, i.e., assigning a maximal number of agents their top items, is often identified as an important efficiency criterion and measure of agents' satisfaction. In this paper, we propose a natural and intuitive efficiency property, favoring-eagerness-for-remaining-items (FERI), which requires that each item is allocated to an agent who ranks it highest among remaining items, thereby implying first-choice maximality. Using FERI as a heuristic, we design mechanisms that satisfy ex-post or ex-ante variants of FERI together with combinations of other desirable properties of efficiency (Pareto-efficiency), fairness (strong equal treatment of equals and sd-weak-envy-freeness), and strategyproofness (sd-weak-strategyproofness). We also explore the limits of FERI mechanisms in providing stronger efficiency, fairness, or strategyproofness guarantees through impossibility results.
Strategic Behavior is Bliss: Iterative Voting Improves Social Welfare
Recent work in iterative voting has defined the additive dynamic price of anarchy (ADPoA) as the difference in social welfare between the truthful and worst-case equilibrium profiles resulting from repeated strategic manipulations. While iterative plurality has been shown to only return alternatives with at most one less initial votes than the truthful winner, it is less understood how agents' welfare changes in equilibrium. To this end, we differentiate agents' utility from their manipulation mechanism and determine iterative plurality's ADPoA in the worst- and average-cases. We first prove that the worst-case ADPoA is linear in the number of agents. To overcome this negative result, we study the average-case ADPoA and prove that equilibrium winners have a constant order welfare advantage over the truthful winner in expectation. Our positive results illustrate the prospect for social welfare to increase due to strategic manipulation.
Multi-agent Reinforcement Learning with Graph Q-Networks for Antenna Tuning
Bouton, Maxime, Jeong, Jaeseong, Outes, Jose, Mendo, Adriano, Nikou, Alexandros
Future generations of mobile networks are expected to contain more and more antennas with growing complexity and more parameters. Optimizing these parameters is necessary for ensuring the good performance of the network. The scale of mobile networks makes it challenging to optimize antenna parameters using manual intervention or hand-engineered strategies. Reinforcement learning is a promising technique to address this challenge but existing methods often use local optimizations to scale to large network deployments. We propose a new multi-agent reinforcement learning algorithm to optimize mobile network configurations globally. By using a value decomposition approach, our algorithm can be trained from a global reward function instead of relying on an ad-hoc decomposition of the network performance across the different cells. The algorithm uses a graph neural network architecture which generalizes to different network topologies and learns coordination behaviors. We empirically demonstrate the performance of the algorithm on an antenna tilt tuning problem and a joint tilt and power control problem in a simulated environment.
On Multi-Agent Deep Deterministic Policy Gradients and their Explainability for SMARTS Environment
Multi-Agent RL or MARL is one of the complex problems in Autonomous Driving literature that hampers the release of fully-autonomous vehicles today. Several simulators have been in iteration after their inception to mitigate the problem of complex scenarios with multiple agents in Autonomous Driving. One such simulator--SMARTS, discusses the importance of cooperative multi-agent learning. For this problem, we discuss two approaches--MAPPO and MADDPG, which are based on-policy and off-policy RL approaches. We compare our results with the state-of-the-art results for this challenge and discuss the potential areas of improvement while discussing the explainability of these approaches in conjunction with waypoints in the SMARTS environment.
How the World Economic Forum Plans to Bring Leaders Together in the Metaverse
There are many companies angling to make money in the metaverse at the moment, but far fewer trying to use its technology for public good. The World Economic Forum hopes to change that with the Global Collaboration Village, which will be introduced at Davos this year ahead of a full rollout. The virtual village has been designed to function--and look--like the real Swiss town, except that here the people convening in co-working spaces, attending conferences in government buildings, and browsing museums will be doing so as avatars. WEF executive chairman Klaus Schwab, who has spent decades cultivating in-person interactions between world leaders, hopes the village will serve as a consistent meeting ground for Davos' stakeholders, transforming the conference from a cloistered one-week gathering to a year-round project. "This could revolutionize global collaboration," Schwab told TIME in the weeks before the January gathering.