Reviews: Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement

Neural Information Processing Systems 

Learning from Observation (LoF) is harder, but more practical, than Learning from Demonstration (LfD) that involves both action and state supervisions. The paper studies the difference between the two types of learning in both theoretical and practical perspectives, and relates the gap between LfD and LfO to inverse dynamics disagreement between the imitator and the expert. The paper includes an elaborate and interesting theoretical analysis of this gap, and proposes a method for bridging the gap through entropy maximization. The empirical evaluation is also thorough and includes both a toy problem for studying the effect of inverse dynamics discrepancy, MuJoCO problems and an ablation study. The reviewers are in agreement that this is a good, technically sound paper.