Independent Policy Gradient Methods for Competitive Reinforcement Learning Constantinos Daskalakis Dylan J. Foster Noah Golowich

Jan-23-2025, 14:36:17 GMT–Neural Information Processing Systems

We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent policy gradient methods in competitive RL; prior work has largely focused on centralized, coordinated procedures for equilibrium computation.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Jan-23-2025, 14:36:17 GMT

Conferences PDF

Add feedback

Country:
- North America (0.46)

Genre:
- Overview (0.46)
- Research Report (0.47)

Industry:
- Leisure & Entertainment > Games (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning > Agents (1.00)