Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes

Oct-10-2024, 08:00:10 GMT–Neural Information Processing Systems

We study sequential decision-making problems in which each agent aims to maximize the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon Constrained Markov Decision Processes (CMDPs) problem. Specifically, we propose a new Natural Policy Gradient Primal-Dual (NPG-PD) method for CMDPs which updates the primal variable via natural policy gradient ascent and the dual variable via projected sub-gradient descent. Even though the underlying maximization involves a nonconcave objective function and a nonconvex constraint set under the softmax policy parametrization, we prove that our method achieves global convergence with sublinear rates regarding both the optimality gap and the constraint violation. Such a convergence is independent of the size of the state-action space, i.e., it is dimension-free.

constrained markov decision process, convergence, natural policy gradient primal-dual method, (5 more...)

Neural Information Processing Systems

Oct-10-2024, 08:00:10 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (0.61)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.64)