d-distillation
fef6f971605336724b5e6c0c12dc2534-Supplemental.pdf
I W scalars. Taking an expectation on both sides of (17) we obtain { } The next lemma characterizes the spectral properties of the disagreement matrix, used in Lemma 4. W is also a stochastic matrix. W are that of I W, each with multiplicity K. W) = 1 with multiplicity K. Again we can check that the eigenspace of ( λ We prove this result by induction on n. For n = 1 it is trivial. Now assume that the inequality holds for all l n 1. We provide the proof here for completeness.
- North America > United States > Massachusetts (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
|, which is constant for all t. Define the total disagreement error as φ (z
The next lemma characterizes the spectral properties of the disagreement matrix, used in Lemma 4. 18 Lemma 7. W is also a stochastic matrix. W are that of I W, each with multiplicity K . Lemma 8. F or every n > 0 we have null null The next Lemma is a well known bound for functions with Lipschitz gradients. The importance is merely technical, and is meant to compress our set of assumption. The MNIST results in Figure 1 used the same settings as above.
- North America > United States > Massachusetts (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)