Implicit Distributional Reinforcement Learning: Appendix A Proof of Lemma 1 Denote H = E a π log π