Minimax Exploiter: A Data Efficient Approach for Competitive Self-Play

Bairamian, Daniel, Marcotte, Philippe, Romoff, Joshua, Robert, Gabriel, Nowrouzezahrai, Derek

Nov-28-2023–arXiv.org Artificial Intelligence

Recent advances in Competitive Self-Play (CSP) have achieved, or even surpassed, human level performance in complex game environments such as Dota 2 and StarCraft II using Distributed Multi-Agent Reinforcement Learning (MARL). One core component of these methods relies on creating a pool of learning agents -- consisting of the Main Agent, past versions of this agent, and Exploiter Agents -- where Exploiter Agents learn counter-strategies to the Main Agents. A key drawback of these approaches is the large computational cost and physical time that is required to train the system, making them impractical to deploy in highly iterative real-life settings such as video game productions. In this paper, we propose the Minimax Exploiter, a game theoretic approach to exploiting Main Agents that leverages knowledge of its opponents, leading to significant increases in data efficiency. We validate our approach in a diversity of settings, including simple turn based games, the arcade learning environment, and For Honor, a modern video game. The Minimax Exploiter consistently outperforms strong baselines, demonstrating improved stability and data efficiency, leading to a robust CSP-MARL method that is both flexible and easy to deploy.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

Nov-28-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada > Quebec
    - Montreal (0.29)
  - United States > California
    - San Francisco County > San Francisco (0.14)
    - Santa Clara County > Palo Alto (0.04)
- Oceania > New Zealand
  - North Island > Auckland Region > Auckland (0.04)

Genre:
- Research Report (0.41)

Industry:
- Leisure & Entertainment > Games > Computer Games (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning
    - Agents (1.00)
    - Search (0.89)