Global Convergence for Average Reward Constrained MDPs with Primal-Dual Actor Critic Algorithm
–Neural Information Processing Systems
This paper investigates infinite-horizon average reward Constrained Markov Decision Processes (CMDPs) under general parametrized policies with smooth and bounded policy gradients. We propose a Primal-Dual Natural Actor-Critic algorithm that adeptly manages constraints while ensuring a high convergence rate. In particular, our algorithm achieves global convergence and constraint violation rates of O(1/ T) over a horizon of length T when the mixing time, τmix, is known to the learner. In absence of knowledge of τmix, the achievable rates change to O(1/T0.5 ϵ) provided that T O τ2/ϵmix . Our results match the theoretical lower bound for Markov Decision Processes and establish a new benchmark in the theoretical exploration of average reward CMDPs.
Neural Information Processing Systems
Jun-19-2026, 07:16:18 GMT
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.66)
- Research Report
- Industry:
- Transportation (0.46)
- Information Technology (0.46)