Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs

Ding, Dongsheng, Zhang, Kaiqing, Duan, Jiali, Başar, Tamer, Jovanović, Mihailo R.

Oct-17-2023–arXiv.org Artificial Intelligence

We study sequential decision making problems aimed at maximizing the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon optimal control problem for Constrained Markov Decision Processes (constrained MDPs). Specifically, we propose a new Natural Policy Gradient Primal-Dual (NPG-PD) method that updates the primal variable via natural policy gradient ascent and the dual variable via projected sub-gradient descent. Although the underlying maximization involves a nonconcave objective function and a nonconvex constraint set, under the softmax policy parametrization we prove that our method achieves global convergence with sublinear rates regarding both the optimality gap and the constraint violation. Such convergence is independent of the size of the state-action space, i.e., it is dimension-free. Furthermore, for log-linear and general smooth policy parametrizations, we establish sublinear convergence rates up to a function approximation error caused by restricted policy parametrization. We also provide convergence and finitesample complexity guarantees for two sample-based NPG-PD algorithms. Finally, we use computational experiments to showcase the merits and the effectiveness of our approach. Keywords: Constrained Markov decision processes; Natural policy gradient; Constrained nonconvex optimization; Method of Lagrange multipliers; Primal-dual algorithms.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

Oct-17-2023

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - Jordan (0.04)
- Europe
  - France > Provence-Alpes-Côte d'Azur
    - Alpes-Maritimes > Nice (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
- North America > United States
  - California
    - Alameda County > Berkeley (0.04)
    - Los Angeles County > Los Angeles (0.28)
  - Illinois > Champaign County
    - Champaign (0.04)
  - Maryland > Prince George's County
    - College Park (0.14)
  - Pennsylvania > Philadelphia County
    - Philadelphia (0.14)
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre:
- Research Report (1.00)

Industry:
- Education (0.45)
- Government (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.86)
    - Reinforcement Learning (1.00)
    - Statistical Learning (1.00)
  - Representation & Reasoning > Optimization (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found