Reviews: Convergent Policy Optimization for Safe Reinforcement Learning
–Neural Information Processing Systems
Quality 4 - 5 Overall 5 - 6 Overall, this seems like a nice paper, but I found it hard to evaluate given my background. I also with the authors had given some intuition for the theoretical properties of their method. My main concerns are over the originality (it seems very similar to [34]), and the weakness of the experiments. Originality: 5/10 This paper seems mostly to be about transferring the more general result of [34] to the specific setting of constrained MDPs. So I wish the authors gave more attention to [34], specifically: - reviewing the contribution of [34] in more detail - clarifying the novelty of this work (Is it in the specific design choices?
Neural Information Processing Systems
Jan-27-2025, 12:17:50 GMT