Almost Optimal Algorithms for Two-player Markov Games with Linear Function Approximation

Chen, Zixiang, Zhou, Dongruo, Gu, Quanquan

arXiv.org Machine Learning 

Multi-agent reinforcement learning (MARL) has achieved tremendous practical success across a wide range of machine learning tasks, including large-scale strategy games such as GO (Silver et al., 2016), TexasHold'em poker (Brown and Sandholm, 2019), real-time video games such as Starcraft (Vinyals et al., 2019), and autonomous driving (Shalev-Shwartz et al., 2016). Among these models used in MARL, two-player zero-sum Markov games (MG) (Shapley, 1953; Littman, 1994) is probably one of the most widely studied models and can be regarded as a generalization of the Markov Decision Processes (MDP) (Puterman, 2014). In two-player Markov games, the two players share states, play actions simultaneously and independently, and observe the same reward. One player (i.e., max-player) aims to maximize the return while the other (i.e., min-player) aims to minimize it. A special case of general Markov games (i.e., simultaneous-move games) is turn-based games, where only one player can take action in each step, i.e., the max and min players take turns to play the game. The players aim to find the Nash equilibrium for this game.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found