AITopics | constrained reinforcement learning

Autonomous agents must often deal with conflicting requirements, such as completing tasks using the least amount of time/energy, learning multiple tasks, or dealing with multiple opponents. In the context of reinforcement learning~(RL), these problems are addressed by (i)~designing a reward function that simultaneously describes all requirements or (ii)~combining modular value functions that encode them individually. Though effective, these methods have critical downsides. Designing good reward functions that balance different objectives is challenging, especially as the number of objectives grows. Moreover, implicit interference between goals may lead to performance plateaus as they compete for resources, particularly when training on-policy. Similarly, selecting parameters to combine value functions is at least as hard as designing an all-encompassing reward, given that the effect of their values on the overall policy is not straightforward.

constrained reinforcement learning, name change, zero duality gap, (6 more...)

Neural Information Processing Systems

Country: South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.07)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.58)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.42)

Add feedback

bdc48324d6158a7edef88d673855a3f4-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 06:13:48 GMT

constraint, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
North America > Canada (0.04)
Asia > China > Hong Kong (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Constrained Reinforcement Learning Has Zero Duality Gap

Santiago Paternain, Luiz Chamon, Miguel Calvo-Fullana, Alejandro Ribeiro

Neural Information Processing SystemsAug-20-2025, 01:17:21 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, dual problem, parametrization, (14 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Pennsylvania (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning

Neural Information Processing SystemsMay-27-2025, 19:56:10 GMT

Constrained Reinforcement Learning (CRL) tackles sequential decision-making problems where agents are required to achieve goals by maximizing the expected return while meeting domain-specific constraints, which are often formulated on expected costs. In this setting, policy-based methods are widely used since they come with several advantages when dealing with continuous-control problems. These methods search in the policy space with an action-based or parameter-based exploration strategy, depending on whether they learn directly the parameters of a stochastic policy or those of a stochastic hyperpolicy. In this paper, we propose a general framework for addressing CRL problems via gradient-based primal-dual algorithms, relying on an alternate ascent/descent scheme with dual-variable regularization. We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-iterate convergence guarantees under (weak) gradient domination assumptions, improving and generalizing existing results.

constrained reinforcement learning, last-iterate global convergence, policy gradient, (3 more...)

Neural Information Processing Systems

Country: Europe > Montenegro (0.08)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)

Add feedback

Deterministic Policies for Constrained Reinforcement Learning in Polynomial Time

Neural Information Processing SystemsMay-27-2025, 12:17:42 GMT

We present a novel algorithm that efficiently computes near-optimal deterministic policies for constrained reinforcement learning (CRL) problems. Our approach combines three key ideas: (1) value-demand augmentation, (2) action-space approximate dynamic programming, and (3) time-space rounding. Our algorithm constitutes a fully polynomial-time approximation scheme (FPTAS) for any time-space recursive (TSR) cost criteria. A TSR criteria requires the cost of a policy to be computable recursively over both time and (state) space, which includes classical expectation, almost sure, and anytime constraints. Our work answers three open questions spanning two long-standing lines of research: polynomial-time approximability is possible for 1) anytime-constrained policies, 2) almost-sure-constrained policies, and 3) deterministic expectation-constrained policies.

constrained reinforcement learning, deterministic policy, polynomial time, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

CRLLK: Constrained Reinforcement Learning for Lane Keeping in Autonomous Driving

Gao, Xinwei, Singh, Arambam James, Royyuru, Gangadhar, Yuhas, Michael, Easwaran, Arvind

arXiv.org Artificial IntelligenceMar-28-2025

Lane keeping in autonomous driving systems requires scenario-specific weight tuning for different objectives. We formulate lane-keeping as a constrained reinforcement learning problem, where weight coefficients are automatically learned along with the policy, eliminating the need for scenario-specific tuning. Empirically, our approach outperforms traditional RL in efficiency and reliability. Additionally, real-world demonstrations validate its practical value for real-world autonomous driving.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

2503.22248

Country:

Asia > Singapore (0.07)
North America > United States > Michigan > Wayne County > Detroit (0.05)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (0.95)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reviews: Constrained Reinforcement Learning Has Zero Duality Gap

Neural Information Processing SystemsJan-26-2025, 21:16:33 GMT

The paper studies a form of constrained reinforcement learning in which the constraints are bounds on the value functions for auxiliary rewards. This allows a more expressive formulation than the common approach of defining the reward as a linear combination of multiple objectives. The authors show that under certain conditions, the constraint optimization problem has zero duality gap, implying that a solution can be found by solving the dual optimization problem, which is convex. The authors also extend this analysis to the case for which the policy is parameterized. Theorem 1 assumes that Slater's condition holds, which is problematic for two reasons. Slater's condition is usually defined for convex constraints, but the authors specifically state that the constraints in PI are non-convex.

constrained reinforcement learning, constraint, slater, (9 more...)

Neural Information Processing Systems

Technology: