Independent Policy Gradient Methods for Competitive Reinforcement Learning
–Neural Information Processing Systems
MinimaxvShapley[63]showed gameG, thereexists( 1, 2)suchthat V ( 1, 2) V ( 1, 2) V ( 1, 2), forall 1, 2, (1) andinparticularV = min 1max 2V ( 1, 2)=max 2min 1V ( 1, 2). Thecruxxplayer timescalethany-player, they-player Compared 43], whichestablishesy-player gradientdominancey-player' ofthegradient t, (y) = ( f(xt, )) (y), then averageusing Is Q-learningprovably Inin Neural Information Processing Systems, pages 4863-4873, 2018.
Neural Information Processing Systems
Feb-8-2026, 03:45:17 GMT
- Country:
- Technology: