Goto

Collaborating Authors

 exploitability





XDO: ADoubleOracleAlgorithmfor Extensive-FormGames

Neural Information Processing Systems

Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algorithm for two-player zero-sum games that has been empirically shown to find approximate Nash equilibria in large games.


Appendices Contents Appendices 18

Neural Information Processing Systems

Diplomacyisacomplex environment, where training requires significant time. The action is an allocation of the player's coins across the fields: the player decides how manyof itsccoins to put in each of the fields, choosing c1,c2,...,cf where Pf Finally, Blotto is a single-turn (i.e.