Agent Societies
Average-Case Analysis of Iterative Voting
It is well-known in social choice that people may misreport their preferences to improve group decisions in their favor. Consider, for example, Alice, Bob, and Charlie deciding on which ice cream flavor to order for a party, and Charlie prefers strawberry to chocolate to vanilla. Given that Alice wants chocolate and Bob wants vanilla, Charlie would be better off voting for chocolate than truthfully (i.e., strawberry), by which vanilla may win as the tie-breaker. This form of strategic behavior is prolific in political science in narrowing the number of political parties (see e.g., Duvuger's law [Riker, 1982]). Still, it is unclear what effect strategic behavior has on the social welfare of chosen outcomes. Iterative voting (IV) is one model which naturally describes agents' strategic behavior - in misreporting their truthful preferences - over time. After agents reveal their preferences initially, they have the opportunity to repeatedly update their votes given information about other agents' votes, before the final decision is reached. Meir et al. [2010] first proposed iterative plurality voting and identified many sufficient conditions for IV to converge. This was followed up by a series of work examining various social choice rules, information and behavioral assumptions, and settings to determine when, to what outcomes, and how fast IV converges (see e.g.
Imagine, Initialize, and Explore: An Effective Exploration Method in Multi-Agent Reinforcement Learning
Liu, Zeyang, Wan, Lipeng, Yang, Xinrui, Chen, Zhuoran, Chen, Xingyu, Lan, Xuguang
Effective exploration is crucial to discovering optimal strategies for multi-agent reinforcement learning (MARL) in complex coordination tasks. Existing methods mainly utilize intrinsic rewards to enable committed exploration or use role-based learning for decomposing joint action spaces instead of directly conducting a collective search in the entire action-observation space. However, they often face challenges obtaining specific joint action sequences to reach successful states in long-horizon tasks. To address this limitation, we propose Imagine, Initialize, and Explore (IIE), a novel method that offers a promising solution for efficient multi-agent exploration in complex scenarios. IIE employs a transformer model to imagine how the agents reach a critical state that can influence each other's transition functions. Then, we initialize the environment at this state using a simulator before the exploration phase. We formulate the imagination as a sequence modeling problem, where the states, observations, prompts, actions, and rewards are predicted autoregressively. The prompt consists of timestep-to-go, return-to-go, influence value, and one-shot demonstration, specifying the desired state and trajectory as well as guiding the action generation. By initializing agents at the critical states, IIE significantly increases the likelihood of discovering potentially important under-explored regions. Despite its simplicity, empirical results demonstrate that our method outperforms multi-agent exploration baselines on the StarCraft Multi-Agent Challenge (SMAC) and SMACv2 environments. Particularly, IIE shows improved performance in the sparse-reward SMAC tasks and produces more effective curricula over the initialized states than other generative methods, such as CVAE-GAN and diffusion models.
Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems
Suau, Miguel, He, Jinke, รelikok, Mustafa Mert, Spaan, Matthijs T. J., Oliehoek, Frans A.
Due to its high sample complexity, simulation is, as of today, critical for the successful application of reinforcement learning. Many real-world problems, however, exhibit overly complex dynamics, which makes their full-scale simulation computationally slow. In this paper, we show how to decompose large networked systems of many agents into multiple local components such that we can build separate simulators that run independently and in parallel. To monitor the influence that the different local components exert on one another, each of these simulators is equipped with a learned model that is periodically trained on real trajectories. Our empirical results reveal that distributing the simulation among different processes not only makes it possible to train large multi-agent systems in just a few hours but also helps mitigate the negative effects of simultaneous learning.
Privacy-Preserving Distributed Optimization and Learning
Distributed optimization and learning has recently garnered great attention due to its wide applications in sensor networks, smart grids, machine learning, and so forth. Despite rapid development, existing distributed optimization and learning algorithms require each agent to exchange messages with its neighbors, which may expose sensitive information and raise significant privacy concerns. In this survey paper, we overview privacy-preserving distributed optimization and learning methods. We first discuss cryptography, differential privacy, and other techniques that can be used for privacy preservation and indicate their pros and cons for privacy protection in distributed optimization and learning. We believe that among these approaches, differential privacy is most promising due to its low computational and communication complexities, which are extremely appealing for modern learning based applications with high dimensions of optimization variables. We then introduce several differential-privacy algorithms that can simultaneously ensure privacy and optimization accuracy. Moreover, we provide example applications in several machine learning problems to confirm the real-world effectiveness of these algorithms. Finally, we highlight some challenges in this research domain and discuss future directions.
Complex behavior from intrinsic motivation to occupy action-state path space
Ramรญrez-Ruiz, Jorge, Grytskyy, Dmytro, Mastrogiuseppe, Chiara, Habib, Yamen, Moreno-Bote, Rubรฉn
Most theories of behavior posit that agents tend to maximize some form of reward or utility. However, animals very often move with curiosity and seem to be motivated in a reward-free manner. Here we abandon the idea of reward maximization, and propose that the goal of behavior is maximizing occupancy of future paths of actions and states. According to this maximum occupancy principle, rewards are the means to occupy path space, not the goal per se; goal-directedness simply emerges as rational ways of searching for resources so that movement, understood amply, never ends. We find that action-state path entropy is the only measure consistent with additivity and other intuitive properties of expected future action-state path occupancy. We provide analytical expressions that relate the optimal policy and state-value function, and prove convergence of our value iteration algorithm. Using discrete and continuous state tasks, including a high--dimensional controller, we show that complex behaviors such as `dancing', hide-and-seek and a basic form of altruistic behavior naturally result from the intrinsic motivation to occupy path space. All in all, we present a theory of behavior that generates both variability and goal-directedness in the absence of reward maximization.
Cooperation and Control in Delegation Games
Sourbut, Oliver, Hammond, Lewis, Wood, Harriet
With the continuing development of powerful and increasing general AI systems, we are likely to see many more Control and cooperation can in turn be broken down into tasks delegated to autonomous machines, from writing problems of alignment and of capabilities [7, 9, 22]. For example, emails to driving us from place to place. Moreover, these in the control failure above, the first AV might drive machines are increasingly likely to come into contact with undesirably by taking route A even though their passenger each other when acting on behalf of their human principals, prefers the scenic beachfront (an alignment problem), whether they are virtual personal assistants attempting to or the second AV might undesirably take route B because schedule a meeting or autonomous vehicles (AVs) using it is incapable of calculating the best route accurately (a the same road network. We refer to these multi-principal, capabilities problem). Similarly, in the cooperation failure, multi-agent scenarios as delegation games, an example of the AVs might cause congestion because they cannot which is as follows, and is shown in Figure 1.
What Do Computing and Economics Have to Say to Each Other?
I described a 1999 result by Elias Koutsoupias and Christos Papadimitriou, regarding multi-agent systems. They studied systems in which non-cooperative agents share a common resource and proposed the ratio between the worst possible Nash equilibrium and the social optimum as a measure of the effectiveness of the system. This ratio has become known as the "Price of Anarchy," as it measures how far from optimal such non-cooperative systems can be. They showed that the price of anarchy could be arbitrarily high, depending on the complexity of the system. The Price-of-Anarchy concept has later been extended to other types of equilibria--for example, Pareto-Optimal Equilibria.b
Platforms for Efficient and Incentive-Aware Collaboration
Haghtalab, Nika, Qiao, Mingda, Yang, Kunhe
Collaboration is crucial for reaching collective goals. However, its effectiveness is often undermined by the strategic behavior of individual agents -- a fact that is captured by a high Price of Stability (PoS) in recent literature [Blum et al., 2021]. Implicit in the traditional PoS analysis is the assumption that agents have full knowledge of how their tasks relate to one another. We offer a new perspective on bringing about efficient collaboration among strategic agents using information design. Inspired by the growing importance of collaboration in machine learning (such as platforms for collaborative federated learning and data cooperatives), we propose a framework where the platform has more information about how the agents' tasks relate to each other than the agents themselves. We characterize how and to what degree such platforms can leverage their information advantage to steer strategic agents toward efficient collaboration. Concretely, we consider collaboration networks where each node is a task type held by one agent, and each task benefits from contributions made in their inclusive neighborhood of tasks. This network structure is known to the agents and the platform, but only the platform knows each agent's real location -- from the agents' perspective, their location is determined by a random permutation. We employ private Bayesian persuasion and design two families of persuasive signaling schemes that the platform can use to ensure a small total workload when agents follow the signal. The first family aims to achieve the minmax optimal approximation ratio compared to the optimal collaboration, which is shown to be $\Theta(\sqrt{n})$ for unit-weight graphs, $\Theta(n^{2/3})$ for graphs with constant minimum edge weights, and $O(n^{3/4})$ for general weighted graphs. The second family ensures per-instance strict improvement compared to full information disclosure.
Learning and Sustaining Shared Normative Systems via Bayesian Rule Induction in Markov Games
Oldenburg, Ninell, Zhi-Xuan, Tan
A universal feature of human societies is the adoption of systems of rules and norms in the service of cooperative ends. How can we build learning agents that do the same, so that they may flexibly cooperate with the human institutions they are embedded in? We hypothesize that agents can achieve this by assuming there exists a shared set of norms that most others comply with while pursuing their individual desires, even if they do not know the exact content of those norms. By assuming shared norms, a newly introduced agent can infer the norms of an existing population from observations of compliance and violation. Furthermore, groups of agents can converge to a shared set of norms, even if they initially diverge in their beliefs about what the norms are. This in turn enables the stability of the normative system: since agents can bootstrap common knowledge of the norms, this leads the norms to be widely adhered to, enabling new entrants to rapidly learn those norms. We formalize this framework in the context of Markov games and demonstrate its operation in a multi-agent environment via approximately Bayesian rule induction of obligative and prohibitive norms. Using our approach, agents are able to rapidly learn and sustain a variety of cooperative institutions, including resource management norms and compensation for pro-social labor, promoting collective welfare while still allowing agents to act in their own interests.
Learning Progress Driven Multi-Agent Curriculum
Zhao, Wenshuai, Li, Zhiyuan, Pajarinen, Joni
Curriculum reinforcement learning (CRL) aims to speed up learning by gradually increasing the difficulty of a task, usually quantified by the achievable expected return. Inspired by the success of CRL in single-agent settings, a few works have attempted to apply CRL to multi-agent reinforcement learning (MARL) using the number of agents to control task difficulty. However, existing works typically use manually defined curricula such as a linear scheme. In this paper, we first apply state-of-the-art single-agent self-paced CRL to sparse reward MARL. Although with satisfying performance, we identify two potential flaws of the curriculum generated by existing reward-based CRL methods: (1) tasks with high returns may not provide informative learning signals and (2) the exacerbated credit assignment difficulty in tasks where more agents yield higher returns. Thereby, we further propose self-paced MARL (SPMARL) to prioritize tasks based on \textit{learning progress} instead of the episode return. Our method not only outperforms baselines in three challenging sparse-reward benchmarks but also converges faster than self-paced CRL.