Review for NeurIPS paper: Strictly Batch Imitation Learning by Energy-based Distribution Matching

Neural Information Processing Systems 

Additional Feedback: - The authors note (with references) that the pure behavioral cloning approach performs poorly as it doesn't use information about the dynamics and state distributions of the problem. It would be useful if the authors could present a short concrete example of exactly what type of information is lost when ignoring the MDP structure. At a first read it feels like it implies the off-line setting means we have all the information we *need* from the start, which I think is the opposite of what the authors are trying to say. - Line 112 - This sentence immediately brings to mind a decision between parametric vs. non-parametric methods. I don't think that's what the authors are trying to say so maybe the terminology of "parameterizing a policy" should be changed throughout the paper. If it is what the authors are trying to say, then it is not made clear why a parametric approach is the correct choice.