Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm

Neural Information Processing Systems 

During initial iterations of training in most Reinforcement Learning (RL) algorithms, agents perform a significant number of random exploratory steps.