Goto

Collaborating Authors

 learning f-divergence


f-GAIL: Learning f-Divergence for Generative Adversarial Imitation Learning

Neural Information Processing Systems

Imitation learning (IL) aims to learn a policy from expert demonstrations that minimizes the discrepancy between the learner and expert behaviors. Various imitation learning algorithms have been proposed with different pre-determined divergences to quantify the discrepancy. This naturally gives rise to the following question: Given a set of expert demonstrations, which divergence can recover the expert policy more accurately with higher data efficiency? In this work, we propose f-GAIL - a new generative adversarial imitation learning model - that automatically learns a discrepancy measure from the f-divergence family as well as a policy capable of producing expert-like behaviors. Compared with IL baselines with various predefined divergence measures, f-GAIL learns better policies with higher data efficiency in six physics-based control tasks.


Review for NeurIPS paper: f-GAIL: Learning f-Divergence for Generative Adversarial Imitation Learning

Neural Information Processing Systems

Additional Feedback: My other main concern is that the objective in Eq. (5) is badly motivated and the implications are under underexplored. The imitation learning objective is notoriously ill-defined and a large part of the literature focuses on introducing objectives that produce good behavior. The notion of finding the "best" f-divergence therefore requires us to state what we are optimizing for, which the authors don't do very explicitly. On line 38, the authors mention that an imitation learning method which uses a fixed divergence method is likely to learn a sub-optimal policy, but the notion of optimality does not exist without a given divergence. For example, whether mode-seeking or mode-covering behavior is better is entirely dependent on context that the agent does not have. Either solution could be better.


Review for NeurIPS paper: f-GAIL: Learning f-Divergence for Generative Adversarial Imitation Learning

Neural Information Processing Systems

After reading the authors' rebuttal, the reviewers discussed their concerns about this paper. Ultimately, a consensus was not reached as reviewer #3 feels that some of her/his concerns were not properly addressed in the authors' feedback. The other reviewers are positive with respect to the paper (especially thanks to the promising experimental results), but they share one of the concerns of reviewer #3, i.e., the definition of optimal f-divergence'' and the convergence properties of the proposed approach. I agree with them that the paper has merits and the ideas contained in the paper are interesting, so I propose to accept it, but I recommend that the authors take the issues raised in the reviews seriously and address them carefully in the final version of the paper.


f-GAIL: Learning f-Divergence for Generative Adversarial Imitation Learning

Neural Information Processing Systems

Imitation learning (IL) aims to learn a policy from expert demonstrations that minimizes the discrepancy between the learner and expert behaviors. Various imitation learning algorithms have been proposed with different pre-determined divergences to quantify the discrepancy. This naturally gives rise to the following question: Given a set of expert demonstrations, which divergence can recover the expert policy more accurately with higher data efficiency? In this work, we propose f-GAIL – a new generative adversarial imitation learning model – that automatically learns a discrepancy measure from the f-divergence family as well as a policy capable of producing expert-like behaviors. Compared with IL baselines with various predefined divergence measures, f-GAIL learns better policies with higher data efficiency in six physics-based control tasks.