An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods Tamer Başar

Jan-24-2025, 13:09:11 GMT–Neural Information Processing Systems

In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG (NPG) methods, and their variance-reduced variants, under general smooth policy parametrizations. More specifically, with the Fisher information matrix of the policy being positive definite: i) we show that a state-of-the-art variance-reduced PG method, which has only been shown to converge to stationary points, converges to the globally optimal value up to some inherent function approximation error due to policy parametrization; ii) we show that NPG enjoys a lower sample complexity; iii) we propose SRVR-NPG, which incorporates variancereduction into the NPG update. Our improvements follow from an observation that the convergence of (variance-reduced) PG and NPG methods can improve each other: the stationary convergence analysis of PG can be applied to NPG as well, and the global convergence analysis of NPG can help to establish the global convergence of (variance-reduced) PG methods.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

Neural Information Processing Systems

Jan-24-2025, 13:09:11 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Reinforcement Learning (0.48)
    - Statistical Learning (0.68)
  - Representation & Reasoning > Mathematical & Statistical Methods (0.46)

Duplicate Docs Excel Report

Title
An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods Tamer Başar Wotao Yin

Similar Docs Excel Report more

Title	Similarity	Source
None found