Constrained Update Projection Approach to Safe Policy Optimization Long Y ang

Neural Information Processing Systems 

Safe reinforcement learning (RL) studies problems where an intelligent agent has to not only maximize reward but also avoid exploring unsafe areas.