Review for NeurIPS paper: f-GAIL: Learning f-Divergence for Generative Adversarial Imitation Learning
–Neural Information Processing Systems
Additional Feedback: My other main concern is that the objective in Eq. (5) is badly motivated and the implications are under underexplored. The imitation learning objective is notoriously ill-defined and a large part of the literature focuses on introducing objectives that produce good behavior. The notion of finding the "best" f-divergence therefore requires us to state what we are optimizing for, which the authors don't do very explicitly. On line 38, the authors mention that an imitation learning method which uses a fixed divergence method is likely to learn a sub-optimal policy, but the notion of optimality does not exist without a given divergence. For example, whether mode-seeking or mode-covering behavior is better is entirely dependent on context that the agent does not have. Either solution could be better.
Neural Information Processing Systems
Jan-26-2025, 17:54:59 GMT
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Robots (0.86)
- Information Technology > Artificial Intelligence