Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes
–Neural Information Processing Systems
We study sequential decision-making problems in which each agent aims to maximize the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon Constrained Markov Decision Processes (CMDPs) problem. Specifically, we propose a new Natural Policy Gradient Primal-Dual (NPG-PD) method for CMDPs which updates the primal variable via natural policy gradient ascent and the dual variable via projected sub-gradient descent.
Neural Information Processing Systems
Feb-8-2026, 14:45:34 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America
- Canada (0.04)
- United States
- California (0.14)
- Illinois (0.04)
- Asia > Middle East
- Industry:
- Government (0.68)
- Health & Medicine (0.93)