A Additional statements and proofs

Neural Information Processing Systems 

This appendix includes the missing proofs of the results presented in the main text, additional results, and some helpful lemmas. Let P and Q be two probability measures defined on the same measurable space (Ω, F), such that P is absolutely continuous with respect to Q. Then the Donsker-Varadhan dual characterization of Kullback-Leibler divergence states that { [ Lemma 3. Let X and Y be independent random variables. If g is a measurable function such that g(x, Y) is σ-subgaussian and E g(x, Y) = 0 for all x X, then g(X, Y) is also σ-subgaussian. To prove the last part of the lemma, we just use the Markov's inequality and combine with this last result: ɛ = P Ψ) E Taking expectation over u on both sides, then swapping the order between expectation over u and absolute value (using Jensen's inequality), we get [ Furthermore, each of these summands has zero mean. Taking expectation over u on both sides and using Jensen's inequality to switch the order of absolute value and expectation of u, we get [ Taking expectation over z on both sides, and then using Jensen's inequality to switch the order of absolute value and expectation of z, we get [ Hence, it can be bounded by 1/(4n).