Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning

May-27-2025, 19:56:10 GMT–Neural Information Processing Systems

Constrained Reinforcement Learning (CRL) tackles sequential decision-making problems where agents are required to achieve goals by maximizing the expected return while meeting domain-specific constraints, which are often formulated on expected costs. In this setting, policy-based methods are widely used since they come with several advantages when dealing with continuous-control problems. These methods search in the policy space with an action-based or parameter-based exploration strategy, depending on whether they learn directly the parameters of a stochastic policy or those of a stochastic hyperpolicy. In this paper, we propose a general framework for addressing CRL problems via gradient-based primal-dual algorithms, relying on an alternate ascent/descent scheme with dual-variable regularization. We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-iterate convergence guarantees under (weak) gradient domination assumptions, improving and generalizing existing results.

constrained reinforcement learning, last-iterate global convergence, policy gradient, (3 more...)

Neural Information Processing Systems

May-27-2025, 19:56:10 GMT

Conferences Web Page

Add feedback

Country:
- Europe > Montenegro (0.08)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)