Independent Policy Gradient Methods for Competitive Reinforcement Learning

Feb-8-2026, 03:45:17 GMT–Neural Information Processing Systems

MinimaxvShapley[63]showed gameG, thereexists( 1, 2)suchthat V ( 1, 2) V ( 1, 2) V ( 1, 2), forall 1, 2, (1) andinparticularV = min 1max 2V ( 1, 2)=max 2min 1V ( 1, 2). Thecruxxplayer timescalethany-player, they-player Compared 43], whichestablishesy-player gradientdominancey-player' ofthegradient t, (y) = ( f(xt, )) (y), then averageusing Is Q-learningprovably Inin Neural Information Processing Systems, pages 4863-4873, 2018.

machine learning, machine learning research, reinforcement learning, (11 more...)

Neural Information Processing Systems

Feb-8-2026, 03:45:17 GMT

Conferences PDF

Add feedback

Country:
- North America
  - United States > Massachusetts
    - Middlesex County > Cambridge (0.05)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East
  - Jordan (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Duplicate Docs Excel Report

Title
3b2acfe2e38102074656ed938abf4ac3-Paper.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found