Review for NeurIPS paper: Strictly Batch Imitation Learning by Energy-based Distribution Matching

Jan-24-2025, 10:29:03 GMT–Neural Information Processing Systems

Additional Feedback: - The authors note (with references) that the pure behavioral cloning approach performs poorly as it doesn't use information about the dynamics and state distributions of the problem. It would be useful if the authors could present a short concrete example of exactly what type of information is lost when ignoring the MDP structure. At a first read it feels like it implies the off-line setting means we have all the information we *need* from the start, which I think is the opposite of what the authors are trying to say. - Line 112 - This sentence immediately brings to mind a decision between parametric vs. non-parametric methods. I don't think that's what the authors are trying to say so maybe the terminology of "parameterizing a policy" should be changed throughout the paper. If it is what the authors are trying to say, then it is not made clear why a parametric approach is the correct choice.

batch imitation learning, energy-based distribution matching, neurips paper, (5 more...)

Neural Information Processing Systems

Jan-24-2025, 10:29:03 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)