Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes

Neural Information Processing Systems 

We study sequential decision-making problems in which each agent aims to maximize the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon Constrained Markov Decision Processes (CMDPs) problem. Specifically, we propose a new Natural Policy Gradient Primal-Dual (NPG-PD) method for CMDPs which updates the primal variable via natural policy gradient ascent and the dual variable via projected sub-gradient descent.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found