Global Convergence for Average Reward Constrained MDPs with Primal-Dual Actor Critic Algorithm

Xu, Yang, Ganesh, Swetha, Mondal, Washim Uddin, Bai, Qinbo, Aggarwal, Vaneet

arXiv.org Artificial Intelligence 

This paper investigates infinite-horizon average reward Constrained Markov Decision Processes (CMDPs) with general parametrization. We propose a Primal-Dual Natural Actor-Critic algorithm that adeptly manages constraints while ensuring a high convergence rate. In particular, our algorithm achieves global convergence and constraint violation rates of $\tilde{\mathcal{O}}(1/\sqrt{T})$ over a horizon of length $T$ when the mixing time, $τ_{\mathrm{mix}}$, is known to the learner. In absence of knowledge of $τ_{\mathrm{mix}}$, the achievable rates change to $\tilde{\mathcal{O}}(1/T^{0.5-ε})$ provided that $T \geq \tilde{\mathcal{O}}\left(τ_{\mathrm{mix}}^{2/ε}\right)$. Our results match the theoretical lower bound for Markov Decision Processes and establish a new benchmark in the theoretical exploration of average reward CMDPs.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found