Goto

Collaborating Authors

 d-distillation


fef6f971605336724b5e6c0c12dc2534-Supplemental.pdf

Neural Information Processing Systems

I W scalars. Taking an expectation on both sides of (17) we obtain { } The next lemma characterizes the spectral properties of the disagreement matrix, used in Lemma 4. W is also a stochastic matrix. W are that of I W, each with multiplicity K. W) = 1 with multiplicity K. Again we can check that the eigenspace of ( λ We prove this result by induction on n. For n = 1 it is trivial. Now assume that the inequality holds for all l n 1. We provide the proof here for completeness.



|, which is constant for all t. Define the total disagreement error as φ (z

Neural Information Processing Systems

The next lemma characterizes the spectral properties of the disagreement matrix, used in Lemma 4. 18 Lemma 7. W is also a stochastic matrix. W are that of I W, each with multiplicity K . Lemma 8. F or every n > 0 we have null null The next Lemma is a well known bound for functions with Lipschitz gradients. The importance is merely technical, and is meant to compress our set of assumption. The MNIST results in Figure 1 used the same settings as above.