Goto

Collaborating Authors

 strategy exploration


Policy Space Response Oracles: A Survey

Bighashdel, Ariyan, Wang, Yongzhao, McAleer, Stephen, Savani, Rahul, Oliehoek, Frans A.

arXiv.org Artificial Intelligence

Game theory provides a mathematical way to study the interaction between multiple decision makers. However, classical game-theoretic analysis is limited in scalability due to the large number of strategies, precluding direct application to more complex scenarios. This survey provides a comprehensive overview of a framework for large games, known as Policy Space Response Oracles (PSRO), which holds promise to improve scalability by focusing attention on sufficient subsets of strategies. We first motivate PSRO and provide historical context. We then focus on the strategy exploration problem for PSRO: the challenge of assembling effective subsets of strategies that still represent the original game well with minimum computational cost. We survey current research directions for enhancing the efficiency of PSRO, and explore the applications of PSRO across various domains. We conclude by discussing open questions and future research.


Regularization for Strategy Exploration in Empirical Game-Theoretic Analysis

Wang, Yongzhao, Wellman, Michael P.

arXiv.org Artificial Intelligence

In iterative approaches to empirical game-theoretic analysis (EGTA), the strategy space is expanded incrementally based on analysis of intermediate game models. A common approach to strategy exploration, represented by the double oracle algorithm, is to add strategies that best-respond to a current equilibrium. This approach may suffer from overfitting and other limitations, leading the developers of the policy-space response oracle (PSRO) framework for iterative EGTA to generalize the target of best response, employing what they term meta-strategy solvers (MSSs). Noting that many MSSs can be viewed as perturbed or approximated versions of Nash equilibrium, we adopt an explicit regularization perspective to the specification and analysis of MSSs. We propose a novel MSS called regularized replicator dynamics (RRD), which simply truncates the process based on a regret criterion. We show that RRD is more adaptive than existing MSSs and outperforms them in various games. We extend our study to three-player games, for which the payoff matrix is cubic in the number of strategies and so exhaustively evaluating profiles may not be feasible. We propose a profile search method that can identify solutions from incomplete models, and combine this with iterative model construction using a regularized MSS. Finally, and most importantly, we reveal that the regret of best response targets has a tremendous influence on the performance of strategy exploration through experiments, which provides an explanation for the effectiveness of regularization in PSRO.