Appendix A Proofs

Neural Information Processing Systems 

The second derivative test confirms that we have a maximum, i.e. The proof for (b) can be found in the work of Goodfellow et al. In this section we present Adversarial Soft Q-Fitting (ASQF), a principled approach to Imitation Learning without Reinforcement Learning that relies exclusively on transitions. Using transitions rather than trajectories presents several practical benefits such as the possibility to deal with asynchronously collected data or non-sequential experts demonstrations. We consider the GAN objective of Eq. (5) with The beginning of the proof closely follows the proof of Appendix A.1.