Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Jan-13-2025, 20:16:19 GMT–Neural Information Processing Systems

Finding approximate Nash equilibria in zero-sum imperfect-information games is challenging when the number of information states is large. Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm grounded in game theory that is guaranteed to converge to an approximate Nash equilibrium. However, PSRO requires training a reinforcement learning policy at each iteration, making it too slow for large games. We show through counterexamples and experiments that DCH and Rectified PSRO, two existing approaches to scaling up PSRO, fail to converge even in small games. We introduce Pipeline PSRO (P2SRO), the first scalable PSRO-based method for finding approximate Nash equilibria in large zero-sum imperfect-information games.

approximate nash equilibria, pipeline psro, scalable approach, (6 more...)

Neural Information Processing Systems

Jan-13-2025, 20:16:19 GMT

Conferences Web Page

Add feedback

Industry:
- Leisure & Entertainment > Games (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)