Goto

Collaborating Authors

 David C. Parkes



Finding Friend and Foe in Multi-Agent Games

Neural Information Processing Systems

Recent breakthroughs in AI for multi-agent games like Go, Poker, and Dota, have seen great strides in recent years. Yet none of these games address the real-life challenge of cooperation in the presence of unknown and uncertain teammates. This challenge is a key game mechanism in hidden role games. Here we develop the DeepRole algorithm, a multi-agent reinforcement learning agent that we test on The Resistance: Avalon, the most popular hidden role game. DeepRole combines counterfactual regret minimization (CFR) with deep value networks trained through self-play.


Long-term Causal Effects via Behavioral Game Theory

Neural Information Processing Systems

Planned experiments are the gold standard in reliably comparing the causal effect of switching from a baseline policy to a new policy. One critical shortcoming of classical experimental methods, however, is that they typically do not take into account the dynamic nature of response to policy changes. For instance, in an experiment where we seek to understand the effects of a new ad pricing policy on auction revenue, agents may adapt their bidding in response to the experimental pricing changes. Thus, causal effects of the new pricing policy after such adaptation period, the long-term causal effects, are not captured by the classical methodology even though they clearly are more indicative of the value of the new policy. Here, we formalize a framework to define and estimate long-term causal effects of policy changes in multiagent economies. Central to our approach is behavioral game theory, which we leverage to formulate the ignorability assumptions that are necessary for causal inference. Under such assumptions we estimate long-term causal effects through a latent space approach, where a behavioral model of how agents act conditional on their latent behaviors is combined with a temporal model of how behaviors evolve over time.


Multi-View Decision Processes: The Helper-AI Problem

Neural Information Processing Systems

We consider a two-player sequential game in which agents have the same reward function but may disagree on the transition probabilities of an underlying Markovian model of the world. By committing to play a specific policy, the agent with the correct model can steer the behavior of the other agent, and seek to improve utility. We model this setting as a multi-view decision process, which we use to formally analyze the positive effect of steering policies. Furthermore, we develop an algorithm for computing the agents' achievable joint policy, and we experimentally show that it can lead to a large utility increase when the agents' models diverge.