Goto

Collaborating Authors

 no-press diplomacy



No-Press Diplomacy: Modeling Multi-Agent Gameplay

Neural Information Processing Systems

Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal. Reliance on trust and coordination makes Diplomacy the first non-cooperative multi-agent benchmark for complex sequential social dilemmas in a rich environment. In this work, we focus on training an agent that learns to play the No Press version of Diplomacy where there is no dedicated communication channel between players.



Reviews: No-Press Diplomacy: Modeling Multi-Agent Gameplay

Neural Information Processing Systems

The dynamically changing alliances mean that the domain of diplomacy presents unique challenges for agents. I agree with the authors that this means that diplomacy is'deserving of special attention', I would consider the full game to be a grand challenge for multi-agent research. With recent progress in large-scale RL focusing on single-agent and 2-player zero sum games, this problem is particularly timely. This work presents state of the art agents trained with deep learning. To my knowledge this is the first successful application of deep learning to diplomacy.


Reviews: No-Press Diplomacy: Modeling Multi-Agent Gameplay

Neural Information Processing Systems

All reviewers agree that this paper explores interesting territory, i.e., multi-agent Learning in the Diplomacy game. It is a well written and presented paper. The paper has generated quite some discussion after the rebuttal, discussing all pros and cons of the work. The major point in favor of the work (as also indicated by the authors themselves) seems to be that the work lays some ground work for future research in the Diplomacy game, that is known to be very hard and challenging. The biggest point of concern is that the paper presents little innovation in the techniques that it deploys but rather shows how the SOTA can be used/engineered to be successful in this domain to a certain extent, and illustrates the performance of known algorithms.


No-Press Diplomacy: Modeling Multi-Agent Gameplay

Neural Information Processing Systems

Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal. Reliance on trust and coordination makes Diplomacy the first non-cooperative multi-agent benchmark for complex sequential social dilemmas in a rich environment. In this work, we focus on training an agent that learns to play the No Press version of Diplomacy where there is no dedicated communication channel between players. The model was trained on a new dataset of more than 150,000 human games. Our model is trained by supervised learning (SL) from expert trajectories, which is then used to initialize a reinforcement learning (RL) agent trained through self-play.


Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning

Bakhtin, Anton, Wu, David J, Lerer, Adam, Gray, Jonathan, Jacob, Athul Paul, Farina, Gabriele, Miller, Alexander H, Brown, Noam

arXiv.org Artificial Intelligence

No-press Diplomacy is a complex strategy game involving both cooperation and competition that has served as a benchmark for multi-agent AI research. While self-play reinforcement learning has resulted in numerous successes in purely adversarial games like chess, Go, and poker, self-play alone is insufficient for achieving optimal performance in domains involving cooperation with humans. We address this shortcoming by first introducing a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitationlearned policy. We prove that this is a no-regret learning algorithm under a modified utility function. We then show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL that provides a model of human play while simultaneously training an agent that responds well to this human model. We used RL-DiL-piKL to train an agent we name Diplodocus. In a 200-game no-press Diplomacy tournament involving 62 human participants spanning skill levels from beginner to expert, two Diplodocus agents both achieved a higher average score than all other participants who played more than two games, and ranked first and third according to an Elo ratings model. In two-player zero-sum (2p0s) settings, principled self-play algorithms converge to a minimax equilibrium, which in a balanced game ensures that a player will not lose in expectation regardless of the opponent's strategy (Neumann, 1928). This fact has allowed self-play, even without human data, to achieve remarkable success in 2p0s games like chess (Silver et al., 2018), Go (Silver et al., 2017), poker (Bowling et al., 2015; Brown & Sandholm, 2017), and Dota 2 (Berner et al., 2019). In principle, any finite 2p0s game can be solved via self-play given sufficient compute and memory. However, in games involving cooperation, self-play alone no longer guarantees good performance when playing with humans, even with infinite compute and memory. This is because in complex domains there may be arbitrarily many conventions and expectations for how to cooperate, of which humans may use only a small subset (Lerer & Peysakhovich, 2019). The clearest example of this is language. A self-play agent trained from scratch without human data in a cooperative game involving free-form communication channels would almost certainly not converge to using English as the medium of communication. Obviously, such an agent would perform poorly when paired with a human English speaker. Indeed, prior work has shown that naïve extensions of self-play from scratch without human data perform poorly when playing with humans or human-like agents even in dialogue-free domains that involve cooperation rather than just competition, such as the benchmark games no-press Diplomacy (Bakhtin et al., 2021) and Hanabi (Siu et al., 2021; Cui et al., 2021).


Modeling Strong and Human-Like Gameplay with KL-Regularized Search

Jacob, Athul Paul, Wu, David J., Farina, Gabriele, Lerer, Adam, Bakhtin, Anton, Andreas, Jacob, Brown, Noam

arXiv.org Artificial Intelligence

We consider the task of building strong but human-like policies in multi-agent decision-making problems, given examples of human behavior. Imitation learning is effective at predicting human actions but may not match the strength of expert humans, while self-play learning and search techniques (e.g. AlphaZero) lead to strong performance but may produce policies that are difficult for humans to understand and coordinate with. We show in chess and Go that regularizing search policies based on the KL divergence from an imitation-learned policy by applying Monte Carlo tree search produces policies that have higher human prediction accuracy and are stronger than the imitation policy. We then introduce a novel regret minimization algorithm that is regularized based on the KL divergence from an imitation-learned policy, and show that applying this algorithm to no-press Diplomacy yields a policy that maintains the same human prediction accuracy as imitation learning while being substantially stronger.


Human-Level Performance in No-Press Diplomacy via Equilibrium Search

Gray, Jonathan, Lerer, Adam, Bakhtin, Anton, Brown, Noam

arXiv.org Artificial Intelligence

Prior AI breakthroughs in complex games have focused on either the purely adversarial or purely cooperative settings. In contrast, Diplomacy is a game of shifting alliances that involves both cooperation and competition. For this reason, Diplomacy has proven to be a formidable research challenge. In this paper we describe an agent for the no-press variant of Diplomacy that combines supervised learning on human data with one-step lookahead search via external regret minimization. External regret minimization techniques have been behind previous AI successes in adversarial games, most notably poker, but have not previously been shown to be successful in large-scale games involving cooperation. We show that our agent greatly exceeds the performance of past no-press Diplomacy bots, is unexploitable by expert humans, and achieves a rank of 23 out of 1,128 human players when playing anonymous games on a popular Diplomacy website. A primary goal for AI research is to develop agents that can act optimally in real-world multi-agent interactions (i.e., games). However, previous large-scale game AI results have focused on either purely competitive or purely cooperative settings. In contrast, real-world games, such as business negotiations, politics, and traffic navigation, involve a far more complex mixture of cooperation and competition. In such settings, the theoretical grounding for the techniques used in previous AI breakthroughs falls apart. In this paper we augment neural policies trained through imitation learning with regret minimization search techniques, and evaluate on the benchmark game of no-press Diplomacy.


No-Press Diplomacy: Modeling Multi-Agent Gameplay

Paquette, Philip, Lu, Yuchen, BOCCO, SETON STEVEN, Smith, Max, O.-G., Satya, Kummerfeld, Jonathan K., Pineau, Joelle, Singh, Satinder, Courville, Aaron C.

Neural Information Processing Systems

Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal. Reliance on trust and coordination makes Diplomacy the first non-cooperative multi-agent benchmark for complex sequential social dilemmas in a rich environment. In this work, we focus on training an agent that learns to play the No Press version of Diplomacy where there is no dedicated communication channel between players. The model was trained on a new dataset of more than 150,000 human games. Our model is trained by supervised learning (SL) from expert trajectories, which is then used to initialize a reinforcement learning (RL) agent trained through self-play.