Review for NeurIPS paper: Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization

Neural Information Processing Systems 

Additional Feedback: I really enjoyed reading the paper. The paper is clearly written and, in my opinion, proposes an elegant solution for an important problem in IRL---the fact that the same policy may arise from multiple rewards. There are, however, a couple of aspects that I think the paper could improve upon (several of which are just minor aspects). I number the different issues to facilitate author's responses. My main question is with respect to the use of the proposed projection with other forms of policy invariance.