Provable Self-Play Algorithms for Competitive Reinforcement Learning

Bai, Yu, Jin, Chi Artificial Intelligence 

This paper studies competitive reinforcement learning (competitive RL), that is, reinforcement learning with two or more agents taking actions simultaneously, but each maximizing their own reward. Competitive RL is a major branch of the more general setting of multi-agent reinforcement learning (MARL), with the specification that the agents have conflicting rewards (so that they essentially compete with each other) yet can be trained in a centralized fashion (i.e. each agent has access to the other agents' policies) (Crandall and Goodrich, 2005). There are substantial recent progresses in competitive RL, in particular in solving hard multi-player games such as GO (Silver et al., 2017), Starcraft (Vinyals et al., 2019), and Dota 2 (OpenAI, 2018). A key highlight in their approaches is the successful use of self-play for achieving superhuman performance in absence of human knowledge or expert opponents. These self-play algorithms are able to learn a good policy for all players from scratch through repeatedly playing the current policies against each other and performing policy updates using these self-played game trajectories. The empirical success of self-play has challenged the conventional wisdom that expert opponents are necessary for achieving good performance, and calls for a better theoretical understanding. In this paper, we take initial steps towards understanding the effectiveness of self-play algorithms in competitive RL from a theoretical perspective.

Duplicate Docs Excel Report

None found

Similar Docs  Excel Report  more

None found