Constrained Update Projection Approach to Safe Policy Optimization
–Neural Information Processing Systems
Safe reinforcement learning (RL) studies problems where an intelligent agent has to not only maximize reward but also avoid exploring unsafe areas. In this study, we propose CUP, a novel policy optimization method based on Constrained Update Projection framework that enjoys rigorous safety guarantee. Central to our CUP development is the newly proposed surrogate functions along with the performance bound. Compared to previous safe reinforcement learning meth- ods, CUP enjoys the benefits of 1) CUP generalizes the surrogate functions to generalized advantage estimator (GAE), leading to strong empirical performance. To validate our CUP method, we compared CUP against a comprehensive list of safe RL baselines on a wide range of tasks.
Neural Information Processing Systems
Oct-10-2024, 17:27:51 GMT
- Technology: