A Proof for Equation (7) in Section 3.2

Neural Information Processing Systems 

In Section 3.2, we propose a shifting operation in eq. Below, we summarize the shifting operation and prove its efficacy in proposition A.1. As presented in Section 3.2, for an f-divergence, its convex conjugate generator function f The environments we used for our experiments are from the OpenAI Gym [10] including the CartPole [8] from the classic RL literature, and five complex tasks simulated with MuJoCo [32], such as HalfCheetah, Hopper, Reacher, Walker, and Humanoid with task screenshots and version numbers shown in Figure 1. Note that behavior cloning (BC) employs the same structure to train a policy network with supervised learning. The reward signal network used in GAIL, BC+GAIL, AIRL, RKL-VIM and f-GAIL are all composed of three hidden layers of 100 units each with first two layers activated with tanh, and the final activation layers listed in Tab. 3. Details of f For the ablation study in Sec 4.3, we changed the number of linear layers to be 1, 2, 4 and 7 (with 100 nodes per layer) and the number of nodes per layer to be 25, 50, 100, and 200 (with 4 layers).