Wellman, Michael P.
Policy Abstraction and Nash Refinement in Tree-Exploiting PSRO
Konicki, Christine, Chakraborty, Mithun, Wellman, Michael P.
Policy Space Response Oracles (PSRO) interleaves empirical game-theoretic analysis with deep reinforcement learning (DRL) to solve games too complex for traditional analytic methods. Tree-exploiting PSRO (TE-PSRO) is a variant of this approach that iteratively builds a coarsened empirical game model in extensive form using data obtained from querying a simulator that represents a detailed description of the game. We make two main methodological advances to TE-PSRO that enhance its applicability to complex games of imperfect information. First, we introduce a scalable representation for the empirical game tree where edges correspond to implicit policies learned through DRL. These policies cover conditions in the underlying game abstracted in the game model, supporting sustainable growth of the tree over epochs. Second, we leverage extensive form in the empirical model by employing refined Nash equilibria to direct strategy exploration. To enable this, we give a modular and scalable algorithm based on generalized backward induction for computing a subgame perfect equilibrium (SPE) in an imperfect-information game. We experimentally evaluate our approach on a suite of games including an alternating-offer bargaining game with outside offers; our results demonstrate that TE-PSRO converges toward equilibrium faster when new strategies are generated based on SPE rather than Nash equilibrium, and with reasonable time/memory requirements for the growing empirical model.
A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning
Li, Zun, Wellman, Michael P.
Evaluating deep multiagent reinforcement learning In purely adversarial (i.e., two-player zero-sum) environments, (MARL) algorithms is complicated by stochasticity distance to Nash equilibrium may be a sufficient metric in training and sensitivity of agent performance [Brown et al., 2020; Schmid et al., 2023], as all equilibria to the behavior of other agents. We propose a metagame are interchangeably optimal. More generally, where there evaluation framework for deep MARL, by are multiple equilibria or where we do not necessarily expect framing each MARL algorithm as a meta-strategy, equilibrium behavior, the metrics for MARL performance and repeatedly sampling normal-form empirical may be less clear. In collaborative domains, global team return games over combinations of meta-strategies resulting is the common objective [Foerster et al., 2018; Rashid from different random seeds. Each empirical et al., 2020], however complex learning dynamics may lead game captures both self-play and cross-play factors agents using the same MARL algorithm to equilibria of distinct across seeds. These empirical games provide machine conventions in different runs [Hu et al., 2020].
Co-Learning Empirical Games and World Models
Smith, Max Olan, Wellman, Michael P.
Game-based decision-making involves reasoning over both world dynamics and strategic interactions among the agents. Typically, empirical models capturing these respective aspects are learned and used separately. We investigate the potential gain from co-learning these elements: a world model for dynamics and an empirical game for strategic interactions. Empirical games drive world models toward a broader consideration of possible game dynamics induced by a diversity of strategy profiles. Conversely, world models guide empirical games to efficiently discover new strategies through planning. We demonstrate these benefits first independently, then in combination as realized by a new algorithm, Dyna-PSRO, that co-learns an empirical game and a world model. When compared to PSRO -- a baseline empirical-game building algorithm, Dyna-PSRO is found to compute lower regret solutions on partially observable general-sum games. In our experiments, Dyna-PSRO also requires substantially fewer experiences than PSRO, a key algorithmic advantage for settings where collecting player-game interaction data is a cost-limiting factor.
Empirical Game-Theoretic Analysis for Mean Field Games
Wang, Yongzhao, Wellman, Michael P.
We present a simulation-based approach for solution of mean field games (MFGs), using the framework of empirical game-theoretical analysis (EGTA). Our primary method employs a version of the double oracle, iteratively adding strategies based on best response to the equilibrium of the empirical MFG among strategies considered so far. We present Fictitious Play (FP) and Replicator Dynamics as two subroutines for computing the empirical game equilibrium. Each subroutine is implemented with a query-based method rather than maintaining an explicit payoff matrix as in typical EGTA methods due to a representation issue we highlight for MFGs. By introducing game model learning and regularization, we significantly improve the sample efficiency of the primary method without sacrificing the overall learning performance. Theoretically, we prove that a Nash equilibrium (NE) exists in the empirical MFG and show the convergence of iterative EGTA to NE of the full MFG with either subroutine. We test the performance of iterative EGTA in various games and show that it outperforms directly applying FP to MFGs in terms of iterations of strategy introduction.
Regularization for Strategy Exploration in Empirical Game-Theoretic Analysis
Wang, Yongzhao, Wellman, Michael P.
In iterative approaches to empirical game-theoretic analysis (EGTA), the strategy space is expanded incrementally based on analysis of intermediate game models. A common approach to strategy exploration, represented by the double oracle algorithm, is to add strategies that best-respond to a current equilibrium. This approach may suffer from overfitting and other limitations, leading the developers of the policy-space response oracle (PSRO) framework for iterative EGTA to generalize the target of best response, employing what they term meta-strategy solvers (MSSs). Noting that many MSSs can be viewed as perturbed or approximated versions of Nash equilibrium, we adopt an explicit regularization perspective to the specification and analysis of MSSs. We propose a novel MSS called regularized replicator dynamics (RRD), which simply truncates the process based on a regret criterion. We show that RRD is more adaptive than existing MSSs and outperforms them in various games. We extend our study to three-player games, for which the payoff matrix is cubic in the number of strategies and so exhaustively evaluating profiles may not be feasible. We propose a profile search method that can identify solutions from incomplete models, and combine this with iterative model construction using a regularized MSS. Finally, and most importantly, we reveal that the regret of best response targets has a tremendous influence on the performance of strategy exploration through experiments, which provides an explanation for the effectiveness of regularization in PSRO.
Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning
Li, Zun, Lanctot, Marc, McKee, Kevin R., Marris, Luke, Gemp, Ian, Hennes, Daniel, Muller, Paul, Larson, Kate, Bachrach, Yoram, Wellman, Michael P.
Multiagent reinforcement learning (MARL) has benefited significantly from population-based and game-theoretic training regimes. One approach, Policy-Space Response Oracles (PSRO), employs standard reinforcement learning to compute response policies via approximate best responses and combines them via meta-strategy selection. We augment PSRO by adding a novel search procedure with generative sampling of world states, and introduce two new meta-strategy solvers based on the Nash bargaining solution. We evaluate PSRO's ability to compute approximate Nash equilibrium, and its performance in two negotiation games: Colored Trails, and Deal or No Deal. We conduct behavioral studies where human participants negotiate with our agents ($N = 346$). We find that search with generative modeling finds stronger policies during both training time and test time, enables online Bayesian co-player prediction, and can produce agents that achieve comparable social welfare negotiating with humans as humans trading among themselves.
A Regression Approach for Modeling Games With Many Symmetric Players
Wiedenbeck, Bryce (Swarthmore College) | Yang, Fengjun (Swarthmore College) | Wellman, Michael P. (University of Michigan)
We exploit player symmetry to formulate the representation of large normal-form games as a regression task. This formulation allows arbitrary regression methods to be employed in in estimating utility functions from a small subset of the game's outcomes. We demonstrate the applicability both neural networks and Gaussian process regression, but focus on the latter. Once utility functions are learned, computing Nash equilibria requires estimating expected payoffs of pure-strategy deviations from mixed-strategy profiles. Computing these expectations exactly requires an infeasible sum over the full payoff matrix, so we propose and test several approximation methods. Three of these are simple and generic, applicable to any regression method and games with any number of player roles. However, the best performance is achieved by a continuous integral that approximates the summation, which we formulate for the specific case of fully-symmetric games learned by Gaussian process regression with a radial basis function kernel. We demonstrate experimentally that the combination of learned utility functions and expected payoff estimation allows us to efficiently identify approximate equilibria of large games using sparse payoff data.
Welfare Effects of Market Making in Continuous Double Auctions
Wah, Elaine, Wright, Mason, Wellman, Michael P.
We investigate the effects of market making on market performance, focusing on allocative efficiency as well as gains from trade accrued by background traders. We employ empirical simulation-based methods to evaluate heuristic strategies for market makers as well as background investors in a variety of complex trading environments. Our market model incorporates private and common valuation elements, with dynamic fundamental value and asymmetric information. In this context, we compare the surplus achieved by background traders in strategic equilibrium, with and without a market maker. Our findings indicate that the presence of the market maker strongly tends to increase total welfare across various environments. Market-maker profit may or may not exceed the welfare gain, thus the effect on background-investor surplus is ambiguous. We find that market making tends to benefit investors in relatively thin markets, and situations where background traders are impatient, due to limited trading opportunities. The presence of additional market makers increases these benefits, as competition drives the market makers to provide liquidity at lower price spreads. A thorough sensitivity analysis indicates that these results are robust to reasonable changes in model parameters.
The Role of Calculi in Uncertain Inference Systems
Wellman, Michael P., Heckerman, David
Much of the controversy about methods for automated decision making has focused on specific calculi for combining beliefs or propagating uncertainty. We broaden the debate by (1) exploring the constellation of secondary tasks surrounding any primary decision problem, and (2) identifying knowledge engineering concerns that present additional representational tradeoffs. We argue on pragmatic grounds that the attempt to support all of these tasks within a single calculus is misguided. In the process, we note several uncertain reasoning objectives that conflict with the Bayesian ideal of complete specification of probabilities and utilities. In response, we advocate treating the uncertainty calculus as an object language for reasoning mechanisms that support the secondary tasks. Arguments against Bayesian decision theory are weakened when the calculus is relegated to this role. Architectures for uncertainty handling that take statements in the calculus as objects to be reasoned about offer the prospect of retaining normative status with respect to decision making while supporting the other tasks in uncertain reasoning.
Exploiting Functional Dependencies in Qualitative Probabilistic Reasoning
Wellman, Michael P.
Functional dependencies restrict the potential interactions among variables connected in a probabilistic network. This restriction can be exploited in qualitative probabilistic reasoning by introducing deterministic variables and modifying the inference rules to produce stronger conclusions in the presence of functional relations. I describe how to accomplish these modifications in qualitative probabilistic networks by exhibiting the update procedures for graphical transformations involving probabilistic and deterministic variables and combinations. A simple example demonstrates that the augmented scheme can reduce qualitative ambiguity that would arise without the special treatment of functional dependency. Analysis of qualitative synergy reveals that new higher-order relations are required to reason effectively about synergistic interactions among deterministic variables.