Reduced Policy Optimization for Continuous Control with Hard Constraints

Jan-19-2025, 09:30:29 GMT–Neural Information Processing Systems

Recent advances in constrained reinforcement learning (RL) have endowed reinforcement learning with certain safety guarantees. However, deploying existing constrained RL algorithms in continuous control tasks with general hard constraints remains challenging, particularly in those situations with non-convex hard constraints. Inspired by the generalized reduced gradient (GRG) algorithm, a classical constrained optimization technique, we propose a reduced policy optimization (RPO) algorithm that combines RL with GRG to address general hard constraints. Subsequently, RPO calculates the nonbasic actions by solving equations based on equality constraints using the obtained basic actions. The policy network is then updated by implicitly differentiating nonbasic actions with respect to basic actions.

constraint, hard constraint, reduced policy optimization, (11 more...)

Neural Information Processing Systems

Jan-19-2025, 09:30:29 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Constraint-Based Reasoning (1.00)
  - Machine Learning > Reinforcement Learning (0.88)