Goto

Collaborating Authors

 Agent Societies



Supplementary Materials of The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games

Neural Information Processing Systems

We assume here that all agents share critic and actor networks, for notational convenience. Gaussian Distribution, from which an action is sampled, in continuous action spaces. In the loss functions above, B refers to the batch size and n refers to the number of agents. Multi-agent Particle-World Environment (MPE) was introduced in (Lowe et al., 2017). StarCraftII Micromanagement Challenge (SMAC) tasks were introduced in (Rashid et al., 2019).



EvolveGraph: Multi-Agent Trajectory Prediction with Dynamic Relational Reasoning

Neural Information Processing Systems

Multi-agent interacting systems are prevalent in the world, from purely physical systems to complicated social dynamic systems. In many applications, effective understanding of the situation and accurate trajectory prediction of interactive agents play a significant role in downstream tasks, such as decision making and planning.


SPD: Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning

Neural Information Processing Systems

As for the single agent, unsupervised learning has been incorporated into RL to acquire diverse skills for the agent without extrinsic reward from the environment, and this scenario is known as unsupervised reinforcement learning (URL).


Strategic Behavior is Bliss: Iterative Voting Improves Social Welfare

Neural Information Processing Systems

Recent work in iterative voting has defined the additive dynamic price of anarchy (ADPoA) as the difference in social welfare between the truthful and worst-case equilibrium profiles resulting from repeated strategic manipulations. While iterative plurality has been shown to only return alternatives with at most one less initial votes than the truthful winner, it is less understood how agents' welfare changes in equilibrium. To this end, we differentiate agents' utility from their manipulation mechanism and determine iterative plurality's ADPoA in the worst-and average-cases. We first prove that the worst-case ADPoA is linear in the number of agents. To overcome this negative result, we study the average-case ADPoA and prove that equilibrium winners have a constant order welfare advantage over the truthful winner in expectation. Our positive results illustrate the prospect for social welfare to increase due to strategic manipulation.