Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

McAleer, Stephen, Lanier, JB, Wang, Kevin, Baldi, Pierre, Fox, Roy, Sandholm, Tuomas

Jul-13-2022–arXiv.org Artificial Intelligence

In competitive two-agent environments, deep reinforcement learning (RL) methods based on the \emph{Double Oracle (DO)} algorithm, such as \emph{Policy Space Response Oracles (PSRO)} and \emph{Anytime PSRO (APSRO)}, iteratively add RL best response policies to a population. Eventually, an optimal mixture of these population policies will approximate a Nash equilibrium. However, these methods might need to add all deterministic policies before converging. In this work, we introduce \emph{Self-Play PSRO (SP-PSRO)}, a method that adds an approximately optimal stochastic policy to the population in each iteration. Instead of adding only deterministic best responses to the opponent's least exploitable population mixture, SP-PSRO also learns an approximately optimal stochastic policy and adds it to the population as well. As a result, SP-PSRO empirically tends to converge much faster than APSRO and in many games converges in just a few iterations.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

Jul-13-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.29)

Genre:
- Research Report (0.64)

Industry:
- Leisure & Entertainment > Games (1.00)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Reinforcement Learning (1.00)
    - Representation & Reasoning > Agents (1.00)
  - Game Theory (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found