Constrained Cross-Entropy Method for Safe Reinforcement Learning

Min Wen, Ufuk Topcu

Neural Information Processing Systems 

We study a safe reinforcement learning problem in which the constraints are defined as the expected cost over finite-length trajectories.