Reviews: Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition

Neural Information Processing Systems 

The paper proposes a method that alternates between learning a reward function and learning a policy. Algorithmically, the proposed method resembles inverse reinforcement learning/imitation learning. However, unlike existing methods that requires expert trajectories, the proposed method only requires goal states that the expert aims to reach. Experiments show that the proposed method reaches the goal states more accurately than an RL method with a naïve binary classification reward.