Efficient exploration of zero-sum stochastic games

Martin, Carlos, Sandholm, Tuomas

arXiv.org Artificial Intelligence 

We study the problem of how to efficiently explore zero-sum games whose payoffs and dynamics are initially unknown. The agent is given a certain number of episodes to learn as much useful information about the game as possible. During this learning, the rewards obtained in the game are fictional and thus do not count toward the evaluation of the final strategy. After this exploration phase, the agent must recommend a strategy that should be minimally exploitable by an adversary (who has complete knowledge of the environment and can thus play optimally against it). This setup is called pure exploration in the single-agent reinforcement learning literature. This is an important problem for simulation-based games in which a black-box simulator is queried with strategies to obtain samples of the players' resulting utilities [33], as opposed to the rules of the game being explicitly given. For example, in many military settings, war game simulators are used to generate strategies, and then the strategies need to be ready to deploy in case of actual war [17]. Another prevalent example is finance, where trading strategies are generated in simulation, and then they need to be ready for live trading. A third example is video games such as Dota 2 [4] and Starcraft II [31], where AIs can be trained largely through self-play.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found