Supplementary materials for " Optimizing Information-theoretical Generalization Bound via Anisotropic Noise in SGLD "

Neural Information Processing Systems 

The supplementary materials are organized as follows. The first lemma is a standard result characterizing the KL divergence between two Gaussian distributions. The proof is then completed by induction. Specifically, let A be an anti-symmetric matrix. Since Eq.(12) holds for any anti-symmetry matrix By Eq.(12), we have null B (G The proof of Lemma 9 can then be obtained by combining Lemma 10 and Lemma 11 together. Proof of Lemma 2. The β -smooth condition gives R Take expectation on Eq.(21) with respect to W Applying Eq.(24) back to Eq.(23) completes the proof.