Open-ended Learning in Symmetric Zero-sum Games

Balduzzi, David, Garnelo, Marta, Bachrach, Yoram, Czarnecki, Wojciech M., Perolat, Julien, Jaderberg, Max, Graepel, Thore

Jan-23-2019–arXiv.org Machine Learning

Zero-sum games such as chess and poker are, abstractly, functions that evaluate pairs of agents, for example labeling them `winner' and `loser'. If the game is approximately transitive, then self-play generates sequences of agents of increasing strength. However, nontransitive games, such as rock-paper-scissors, can exhibit strategic cycles, and there is no longer a clear objective -- we want agents to increase in strength, but against whom is unclear. In this paper, we introduce a geometric framework for formulating agent objectives in zero-sum games, in order to construct adaptive sequences of objectives that yield open-ended learning. The framework allows us to reason about population performance in nontransitive games, and enables the development of a new algorithm (rectified Nash response, PSRO_rN) that uses game-theoretic niching to construct diverse populations of effective agents, producing a stronger set of agents than existing algorithms. We apply PSRO_rN to two highly nontransitive resource allocation games and find that PSRO_rN consistently outperforms the existing alternatives.

agent, algorithm, gamescape, (13 more...)

arXiv.org Machine Learning

Jan-23-2019

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report (0.64)

Industry:
- Leisure & Entertainment > Games > Chess (0.48)

Technology:
- Information Technology
  - Game Theory (1.00)
  - Artificial Intelligence
    - Representation & Reasoning > Agents (1.00)
    - Machine Learning > Reinforcement Learning (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found