Constrained Update Projection Approach to Safe Policy Optimization

Oct-10-2024, 17:27:51 GMT–Neural Information Processing Systems

Safe reinforcement learning (RL) studies problems where an intelligent agent has to not only maximize reward but also avoid exploring unsafe areas. In this study, we propose CUP, a novel policy optimization method based on Constrained Update Projection framework that enjoys rigorous safety guarantee. Central to our CUP development is the newly proposed surrogate functions along with the performance bound. Compared to previous safe reinforcement learning meth- ods, CUP enjoys the benefits of 1) CUP generalizes the surrogate functions to generalized advantage estimator (GAE), leading to strong empirical performance. To validate our CUP method, we compared CUP against a comprehensive list of safe RL baselines on a wide range of tasks.

constrained update projection approach, safe policy optimization, surrogate function, (2 more...)

Neural Information Processing Systems

Oct-10-2024, 17:27:51 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Reinforcement Learning (0.67)