Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation

Jun-8-2021–arXiv.org Machine Learning

Natural policy gradient (NPG) methods with function approximation achieve impressive empirical success in reinforcement learning problems with large state-action spaces. However, theoretical understanding of their convergence behaviors remains limited in the function approximation setting. In this paper, we perform a finite-time analysis of NPG with linear function approximation and softmax parameterization, and prove for the first time that widely used entropy regularization method, which encourages exploration, leads to linear convergence rate. We adopt a Lyapunov drift analysis to prove the convergence results and explain the effectiveness of entropy regularization in improving the convergence rates.

approximation, convergence, function approximation, (13 more...)

arXiv.org Machine Learning

Jun-8-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Illinois > Champaign County > Urbana (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Switzerland > Zürich
    - Zürich (0.14)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.64)

Industry:
- Education (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning (0.67)
  - Representation & Reasoning > Uncertainty
    - Fuzzy Logic (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found