Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies
Yuan, Rui, Du, Simon S., Gower, Robert M., Lazaric, Alessandro, Xiao, Lin
–arXiv.org Artificial Intelligence
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class. Using the compatible function approximation framework, both methods with log-linear policies can be written as inexact versions of the policy mirror descent (PMD) method. We show that both methods attain linear convergence rates and $\tilde{\mathcal{O}}(1/\epsilon^2)$ sample complexities using a simple, non-adaptive geometrically increasing step size, without resorting to entropy or other strongly convex regularization. Lastly, as a byproduct, we obtain sublinear convergence rates for both methods with arbitrary constant step size.
arXiv.org Artificial Intelligence
Feb-21-2023
- Country:
- North America
- United States
- Washington > King County
- Seattle (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- New York > New York County
- New York City (0.14)
- New Jersey > Mercer County
- Princeton (0.04)
- Washington > King County
- Puerto Rico > San Juan
- San Juan (0.04)
- Canada > Quebec
- Montreal (0.04)
- United States
- Europe
- Russia (0.04)
- United Kingdom
- Scotland > City of Edinburgh
- Edinburgh (0.04)
- England
- Oxfordshire > Oxford (0.04)
- Cambridgeshire > Cambridge (0.04)
- Scotland > City of Edinburgh
- France > Hauts-de-France
- Asia
- Russia (0.04)
- Middle East > Jordan (0.04)
- North America
- Genre:
- Research Report (0.82)