Review for NeurIPS paper: Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization
–Neural Information Processing Systems
Correctness: The claims and experiments seem mostly correct. While the analysis shows that the solution to the min-max problem (Eq. I would increase my review if the paper were updated to include a proof that the proposed algorithm converges. One comment about the experiments is that they don't actually show that the proposed method mimics the expert, only that running the proposed algorithm with data generated from an expert results in high reward. I would increase my review if an experiment were added to show that the learned policy actually mimics the demonstrator.
Neural Information Processing Systems
Jan-26-2025, 14:02:07 GMT
- Technology:
- Information Technology > Artificial Intelligence
- Robots (0.40)
- Machine Learning (0.40)
- Information Technology > Artificial Intelligence