Independent Policy Gradient Methods for Competitive Reinforcement Learning

Neural Information Processing Systems 

MinimaxvShapley[63]showed gameG, thereexists( 1, 2)suchthat V ( 1, 2) V ( 1, 2) V ( 1, 2), forall 1, 2, (1) andinparticularV = min 1max 2V ( 1, 2)=max 2min 1V ( 1, 2). Thecruxxplayer timescalethany-player, they-player Compared 43], whichestablishesy-player gradientdominancey-player' ofthegradient t, (y) = ( f(xt, )) (y), then averageusing Is Q-learningprovably Inin Neural Information Processing Systems, pages 4863-4873, 2018.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found