EffectsofSafetyStateAugmentationon SafeExploration

Neural Information Processing Systems 

There are still, however, some unsolved challenges for a successful deployment of RL such as efficient learning of constrained or safe Markov Decision Processes (MDPs) [4].