A Policy-Gradient Approach to Solving Imperfect-Information Games with Iterate Convergence

Liu, Mingyang, Farina, Gabriele, Ozdaglar, Asuman

Aug-1-2024–arXiv.org Artificial Intelligence

Policy gradient methods have become a staple of any single-agent reinforcement learning toolbox, due to their combination of desirable properties: iterate convergence, efficient use of stochastic trajectory feedback, and theoretically-sound avoidance of importance sampling corrections. In multi-agent imperfect-information settings (extensive-form games), however, it is still unknown whether the same desiderata can be guaranteed while retaining theoretical guarantees. Instead, sound methods for extensive-form games rely on approximating counterfactual values (as opposed to Q values), which are incompatible with policy gradient methodologies. In this paper, we investigate whether policy gradient can be safely used in two-player zero-sum imperfect-information extensive-form games (EFGs). W e establish positive results, showing for the first time that a policy gradient method leads to provable best-iterate convergence to a regularized Nash equilibrium in self-play .

convergence, counterfactual value, q-value, (15 more...)

arXiv.org Artificial Intelligence

Aug-1-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre:
- Research Report (0.69)

Industry:
- Leisure & Entertainment > Games (1.00)

Technology:
- Information Technology
  - Game Theory (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning
      - Reinforcement Learning (0.48)
      - Neural Networks (0.45)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found