Parallel Algorithm for Approximating Nash Equilibrium in Multiplayer Stochastic Games with Application to Naval Strategic Planning Sam Ganzfried 1, Conner Laughlin 2, Charles Morefield 2 1 Ganzfried Research 2 Arctan, Inc. Abstract Many real-world domains contain multiple agents behaving strategically with probabilistic transitions and uncertain (potentially infinite) duration. Such settings can be modeled as stochastic games. While algorithms have been developed for solving (i.e., computing a game-theoretic solution concept such as Nash equilibrium) two-player zero-sum stochastic games, research on algorithms for nonzero-sum and multi-player stochastic games is very limited. We present a new algorithm for these settings, which constitutes the first parallel algorithm for multiplayer stochastic games. We present experimental results on a 4-player stochastic game motivated by a naval strategic planning scenario, showing that our algorithm is able to quickly compute strategies constituting Nash equilibrium up to a very small degree of approximation. Introduction Nash equilibrium has emerged as the most compelling solution concept in multiagent strategic interactions. For two-player zero-sum (adversarial) games, a Nash equilibrium can be computed in polynomial time (e.g., by linear programming). This result holds both for simultaneous-move games (often represented as a matrix), and for sequential games of both perfect and imperfect information (often represented as an extensive-form game tree).
Creating strong agents for games with more than two players is a major open problem in AI. Common approaches are based on approximating game-theoretic solution concepts such as Nash equilibrium, which have strong theoretical guarantees in two-player zero-sum games, but no guarantees in non-zero-sum games or in games with more than two players. We describe an agent that is able to defeat a variety of realistic opponents using an exact Nash equilibrium strategy in a 3-player imperfect-information game. This shows that, despite a lack of theoretical guarantees, agents based on Nash equilibrium strategies can be successful in multiplayer games after all.
Counterfactual Regret Minimization (CFR) is the leading algorithm for solving large imperfect-information games. It iteratively traverses the game tree in order to converge to a Nash equilibrium. In order to deal with extremely large games, CFR typically uses domain-specific heuristics to simplify the target game in a process known as abstraction. This simplified game is solved with tabular CFR, and its solution is mapped back to the full game. This paper introduces Deep Counterfactual Regret Minimization (Deep CFR), a form of CFR that obviates the need for abstraction by instead using deep neural networks to approximate the behavior of CFR in the full game. We show that Deep CFR is principled and achieves strong performance in large poker games. This is the first non-tabular variant of CFR to be successful in large games.
Deep counterfactual value networks combined with continual resolving provide a way to conduct depth-limited search in imperfect-information games. However, since their introduction in the DeepStack poker AI, deep counterfactual value networks have not seen widespread adoption. In this paper we introduce several improvements to deep counterfactual value networks, as well as counterfactual regret minimization, and analyze the effects of each change. We combined these improvements to create the poker AI Supremus. We show that while a reimplementation of DeepStack loses head-to-head against the strong benchmark agent Slumbot, Supremus successfully beats Slumbot by an extremely large margin and also achieves a lower exploitability than DeepStack against a local best response. Together, these results show that with our key improvements, deep counterfactual value networks can achieve state-of-the-art performance.
Counterfactual Regret Minimization (CFR) is a popular iterative algorithm for approximating Nash equilibria in imperfect-information multi-step two-player zero-sum games. We introduce the first general, principled method for warm starting CFR. Our approach requires only a strategy for each player, and accomplishes the warm start at the cost of a single traversal of the game tree. The method provably warm starts CFR to as many iterations as it would have taken to reach a strategy profile of the same quality as the input strategies, and does not alter the convergence bounds of the algorithms. Unlike prior approaches to warm starting, ours can be applied in all cases. Our method is agnostic to the origins of the input strategies. For example, they can be based on human domain knowledge, the observed strategy of a strong agent, the solution of a coarser abstraction, or the output of some algorithm that converges rapidly at first but slowly as it gets closer to an equilibrium. Experiments demonstrate that one can improve overall convergence in a game by first running CFR on a smaller, coarser abstraction of the game and then using the strategy in the abstract game to warm start CFR in the full game.