player 2
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (2 more...)
Dominated Actions in Imperfect-Information Games
Dominance is a fundamental concept in game theory. In normal-form games dominated strategies can be identified in polynomial time. As a consequence, iterative removal of dominated strategies can be performed efficiently as a preprocessing step for reducing the size of a game before computing a Nash equilibrium. For imperfect-information games in extensive form, we could convert the game to normal form and then iteratively remove dominated strategies in the same way; however, this conversion may cause an exponential blowup in game size. In this paper we define and study the concept of dominated actions in imperfect-information games. Our main result is a polynomial-time algorithm for determining whether an action is dominated (strictly or weakly) by any mixed strategy in n-player games, which can be extended to an algorithm for iteratively removing dominated actions. This allows us to efficiently reduce the size of the game tree as a preprocessing step for Nash equilibrium computation. We explore the role of dominated actions empirically in "All In or Fold" No-Limit Texas Hold'em poker.
- North America > United States > Texas (0.24)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada (0.04)
- Workflow (0.55)
- Research Report (0.50)
- Asia > Taiwan (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- North America > Canada > Alberta (0.14)
- North America > United States > Texas (0.05)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (2 more...)
- North America > United States (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Game Theory (0.93)
- North America > Canada > Alberta (0.14)
- North America > United States > Texas (0.05)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (2 more...)
- North America > United States > Maryland > Baltimore (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
Convergence of Regret Matching in Potential Games and Constrained Optimization
Anagnostides, Ioannis, Tewolde, Emanuel, Zhang, Brian Hu, Panageas, Ioannis, Conitzer, Vincent, Sandholm, Tuomas
Regret matching (RM) -- and its modern variants -- is a foundational online algorithm that has been at the heart of many AI breakthrough results in solving benchmark zero-sum games, such as poker. Yet, surprisingly little is known so far in theory about its convergence beyond two-player zero-sum games. For example, whether regret matching converges to Nash equilibria in potential games has been an open problem for two decades. Even beyond games, one could try to use RM variants for general constrained optimization problems. Recent empirical evidence suggests that they -- particularly regret matching$^+$ (RM$^+$) -- attain strong performance on benchmark constrained optimization problems, outperforming traditional gradient descent-type algorithms. We show that RM$^+$ converges to an $ε$-KKT point after $O_ε(1/ε^4)$ iterations, establishing for the first time that it is a sound and fast first-order optimizer. Our argument relates the KKT gap to the accumulated regret, two quantities that are entirely disparate in general but interact in an intriguing way in our setting, so much so that when regrets are bounded, our complexity bound improves all the way to $O_ε(1/ε^2)$. From a technical standpoint, while RM$^+$ does not have the usual one-step improvement property in general, we show that it does in a certain region that the algorithm will quickly reach and remain in thereafter. In sharp contrast, our second main result establishes a lower bound: RM, with or without alternation, can take an exponential number of iterations to reach a crude approximate solution even in two-player potential games. This represents the first worst-case separation between RM and RM$^+$. Our lower bound shows that convergence to coarse correlated equilibria in potential games is exponentially faster than convergence to Nash equilibria.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Orange County > Irvine (0.04)
- (3 more...)
Checklist 1. For all authors (a)
Do the main claims made in the abstract and introduction accurately reflect the paper's If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? Did you include the total amount of compute and the type of resources used (e.g., type Did you include any new assets either in the supplemental material or as a URL? [Y es] Did you discuss whether and how consent was obtained from people whose data you're If you used crowdsourcing or conducted research with human subjects... (a) Hyper-parameter V alues learning rate 0.0005, 0.0001 batch size 16, 32 " annealing period 20000, 10000 RNN hidden dimension 64, 32, 16 Table 2: Hyper-parameters of QMIX in the Tiger-Trampoline Experiment In Section 5.1, we show the results of MAPPO and QMIX on the Tiger-Trampoline game. In the Hanabi experiments, we implement IMPROVISED as follows (better viewed together with the pseudocode). Player 1 and player 2 do not share the random seed beforehand. We do not anticipate any immediate negative impact from this work.