constrained reinforcement learning
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Constrained Reinforcement Learning Has Zero Duality Gap
Autonomous agents must often deal with conflicting requirements, such as completing tasks using the least amount of time/energy, learning multiple tasks, or dealing with multiple opponents. In the context of reinforcement learning~(RL), these problems are addressed by (i)~designing a reward function that simultaneously describes all requirements or (ii)~combining modular value functions that encode them individually. Though effective, these methods have critical downsides. Designing good reward functions that balance different objectives is challenging, especially as the number of objectives grows. Moreover, implicit interference between goals may lead to performance plateaus as they compete for resources, particularly when training on-policy. Similarly, selecting parameters to combine value functions is at least as hard as designing an all-encompassing reward, given that the effect of their values on the overall policy is not straightforward.
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > Canada (0.04)
- Asia > China > Hong Kong (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning
Constrained Reinforcement Learning (CRL) tackles sequential decision-making problems where agents are required to achieve goals by maximizing the expected return while meeting domain-specific constraints, which are often formulated on expected costs. In this setting, policy-based methods are widely used since they come with several advantages when dealing with continuous-control problems. These methods search in the policy space with an action-based or parameter-based exploration strategy, depending on whether they learn directly the parameters of a stochastic policy or those of a stochastic hyperpolicy. In this paper, we propose a general framework for addressing CRL problems via gradient-based primal-dual algorithms, relying on an alternate ascent/descent scheme with dual-variable regularization. We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-iterate convergence guarantees under (weak) gradient domination assumptions, improving and generalizing existing results.
Deterministic Policies for Constrained Reinforcement Learning in Polynomial Time
We present a novel algorithm that efficiently computes near-optimal deterministic policies for constrained reinforcement learning (CRL) problems. Our approach combines three key ideas: (1) value-demand augmentation, (2) action-space approximate dynamic programming, and (3) time-space rounding. Our algorithm constitutes a fully polynomial-time approximation scheme (FPTAS) for any time-space recursive (TSR) cost criteria. A TSR criteria requires the cost of a policy to be computable recursively over both time and (state) space, which includes classical expectation, almost sure, and anytime constraints. Our work answers three open questions spanning two long-standing lines of research: polynomial-time approximability is possible for 1) anytime-constrained policies, 2) almost-sure-constrained policies, and 3) deterministic expectation-constrained policies.
CRLLK: Constrained Reinforcement Learning for Lane Keeping in Autonomous Driving
Gao, Xinwei, Singh, Arambam James, Royyuru, Gangadhar, Yuhas, Michael, Easwaran, Arvind
Lane keeping in autonomous driving systems requires scenario-specific weight tuning for different objectives. We formulate lane-keeping as a constrained reinforcement learning problem, where weight coefficients are automatically learned along with the policy, eliminating the need for scenario-specific tuning. Empirically, our approach outperforms traditional RL in efficiency and reliability. Additionally, real-world demonstrations validate its practical value for real-world autonomous driving.
- Asia > Singapore (0.07)
- North America > United States > Michigan > Wayne County > Detroit (0.05)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- (3 more...)
- Transportation > Ground > Road (1.00)
- Information Technology > Robotics & Automation (0.95)
Reviews: Constrained Reinforcement Learning Has Zero Duality Gap
The paper studies a form of constrained reinforcement learning in which the constraints are bounds on the value functions for auxiliary rewards. This allows a more expressive formulation than the common approach of defining the reward as a linear combination of multiple objectives. The authors show that under certain conditions, the constraint optimization problem has zero duality gap, implying that a solution can be found by solving the dual optimization problem, which is convex. The authors also extend this analysis to the case for which the policy is parameterized. Theorem 1 assumes that Slater's condition holds, which is problematic for two reasons. Slater's condition is usually defined for convex constraints, but the authors specifically state that the constraints in PI are non-convex.
Offline Inverse Constrained Reinforcement Learning for Safe-Critical Decision Making in Healthcare
Fang, Nan, Liu, Guiliang, Gong, Wei
Reinforcement Learning (RL) applied in healthcare can lead to unsafe medical decisions and treatment, such as excessive dosages or abrupt changes, often due to agents overlooking common-sense constraints. Consequently, Constrained Reinforcement Learning (CRL) is a natural choice for safe decisions. However, specifying the exact cost function is inherently difficult in healthcare. Recent Inverse Constrained Reinforcement Learning (ICRL) is a promising approach that infers constraints from expert demonstrations. These settings do not align with the practical requirement of a decision-making system in healthcare, where decisions rely on historical treatment recorded in an offline dataset. To tackle these issues, we propose the Constraint Transformer (CT). Specifically, 1) we utilize a causal attention mechanism to incorporate historical decisions and observations into the constraint modeling, while employing a Non-Markovian layer for weighted constraints to capture critical states. In multiple medical scenarios, empirical results demonstrate that CT can capture unsafe states and achieve strategies that approximate lower mortality rates, reducing the occurrence probability of unsafe behaviors. In recent years, the doctor-to-patient ratio imbalance has drawn attention, with the U.S. having only 223.1 physicians per 100,000 people (Petterson et al., 2018). AI-assisted therapy emerges as a promising solution, offering timely diagnosis, personalized care, and reducing dependence on experienced physicians. Therefore, the development of an effective AI healthcare assistant is crucial. Table 1: Proportion of unsafe vaso Reinforcement learning (RL) offers a promising approach doses recommended by physician and to develop AI assistants by addressing sequential DDPG policy. However, this method can still range from 0.1 to 0.2µg/(kg min), lead to unsafe behaviors, such as administering excessive with doses above 0.5 considered high drug dosages, inappropriate adjustments of medical parameters, (Bassi et al., 2013).
- North America > United States > District of Columbia > Washington (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Research Report > Promising Solution (0.74)
- Research Report > New Finding (0.66)