Global Convergence for Average Reward Constrained MDPs with Primal-Dual Actor Critic Algorithm

Jun-19-2026, 07:16:18 GMT–Neural Information Processing Systems

This paper investigates infinite-horizon average reward Constrained Markov Decision Processes (CMDPs) under general parametrized policies with smooth and bounded policy gradients. We propose a Primal-Dual Natural Actor-Critic algorithm that adeptly manages constraints while ensuring a high convergence rate. In particular, our algorithm achieves global convergence and constraint violation rates of O(1/ T) over a horizon of length T when the mixing time, τmix, is known to the learner. In absence of knowledge of τmix, the achievable rates change to O(1/T0.5 ϵ) provided that T O τ2/ϵmix . Our results match the theoretical lower bound for Markov Decision Processes and establish a new benchmark in the theoretical exploration of average reward CMDPs.

assumption 4, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Jun-19-2026, 07:16:18 GMT

Conferences PDF

Add feedback

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.66)

Industry:
- Transportation (0.46)
- Information Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Reinforcement Learning (0.94)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found