AITopics | psro

Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algorithm for two-player zero-sum games that has been empirically shown to find approximate Nash equilibria in large games.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

e9bcd1b063077573285ae1a41025f5dc-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 22:48:14 GMT

P2SROisabletoparallelize PSROwith convergence guarantees bymaintaining ahierarchical pipeline ofreinforcement learning workers, each training against the policies generated by lower levels in the hierarchy.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Orange County > Irvine (0.05)
Asia > Middle East > Jordan (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Netherlands > North Brabant > Eindhoven (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (0.47)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.73)

Add feedback

1cd73be1e256a7405516501e94e892ac-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 17:46:29 GMT

earning rate, psro, umber, (14 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.47)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.30)

Add feedback

XDO: A Double Oracle Algorithm for Extensive-Form Games

Neural Information Processing SystemsDec-24-2025, 21:06:57 GMT

Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algorithm for two-player zero-sum games that has been empirically shown to find approximate Nash equilibria in large games. Although PSRO is guaranteed to converge to an approximate Nash equilibrium and can handle continuous actions, it may take an exponential number of iterations as the number of information states (infostates) grows. We propose Extensive-Form Double Oracle (XDO), an extensive-form double oracle algorithm for two-player zero-sum games that is guaranteed to converge to an approximate Nash equilibrium linearly in the number of infostates. Unlike PSRO, which mixes best responses at the root of the game, XDO mixes best responses at every infostate. We also introduce Neural XDO (NXDO), where the best response is learned through deep RL. In tabular experiments on Leduc poker, we find that XDO achieves an approximate Nash equilibrium in a number of iterations an order of magnitude smaller than PSRO. Experiments on a modified Leduc poker game and Oshi-Zumo show that tabular XDO achieves a lower exploitability than CFR with the same amount of computation. We also find that NXDO outperforms PSRO and NFSP on a sequential multidimensional continuous-action game. NXDO is the first deep RL method that can find an approximate Nash equilibrium in high-dimensional continuous-action sequential games.

approximate nash equilibrium, double oracle algorithm, name change, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.59)

Add feedback

Neural Auto-Curricula in Two-Player Zero-Sum Games

Neural Information Processing SystemsDec-23-2025, 20:22:36 GMT

When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population. Within such a process, the update rules of who to compete with (i.e., the opponent mixture) and how to beat them (i.e., finding best responses) are underpinned by manually developed game theoretical principles such as fictitious play and Double Oracle. In this paper, we introduce a novel framework--Neural Auto-Curricula (NAC)--that leverages meta-gradient descent to automate the discovery of the learning update rule without explicit human design. Specifically, we parameterise the opponent selection module by neural networks and the best-response module by optimisation subroutines, and update their parameters solely via interaction with the game engine, where both players aim to minimise their exploitability. Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance with the state-of-the-art population-based game solvers (e.g., PSRO) on Games of Skill, differentiable Lotto, non-transitive Mixture Games, Iterated Matching Pennies, and Kuhn Poker. Additionally, we show that NAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO on Leduc Poker. Our work inspires a promising future direction to discover general MARL algorithms solely from data.

name change, neural auto-curriculum, two-player zero-sum game, (9 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.59)

Add feedback

8e4ccc9ca6ae2225c4cbb7782ab48daf-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 01:12:35 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > New Jersey > Mercer County > Princeton (0.04)

Genre: Research Report (0.93)

Industry:

Leisure & Entertainment > Games > Computer Games (0.68)
Leisure & Entertainment > Sports > Soccer (0.67)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Supplementary Materials For XDO: A Double Oracle Algorithm for Extensive-Form Games 1 Proofs Proposition 1. In XDO with an null

Neural Information Processing SystemsAug-17-2025, 05:50:00 GMT

's population policies chooses action In a given iteration, consider the restricted game for a single GMP game. If player 2 is not allowed an action unavailable to player 1, player 2's BR will be a new action In pk,m q-clone GMP with n classes, XDO adds at most 2 n actions for each player . In total, 2n actions may be added for each player.Proposition 6. Like in that work, we represent actions that are in the restricted game by bold arrows. Extensive-form pure strategies specify an action at every infostate.

artificial intelligence, iteration, machine learning, (15 more...)

Neural Information Processing Systems

Country: