Reviews: Convergent Policy Optimization for Safe Reinforcement Learning

Neural Information Processing Systems 

Quality 4 - 5 Overall 5 - 6 Overall, this seems like a nice paper, but I found it hard to evaluate given my background. I also with the authors had given some intuition for the theoretical properties of their method. My main concerns are over the originality (it seems very similar to [34]), and the weakness of the experiments. Originality: 5/10 This paper seems mostly to be about transferring the more general result of [34] to the specific setting of constrained MDPs. So I wish the authors gave more attention to [34], specifically: - reviewing the contribution of [34] in more detail - clarifying the novelty of this work (Is it in the specific design choices?