Education
63dc7ed1010d3c3b8269faf0ba7491d4-Supplemental.pdf
In this document, we provide details and supplementary materials that cannot fit into the main manuscript due to the page limit. The specific form ofcenter distribution isunknown, but we can still train a generatorG to approximate it. IfR(G,D,T)),wechooseλ=0, i.e., no restriction onR(G,D,T)), to obtain the minimal cost. IfR(G,D,T)) >, then a large λshould be applied as apenalization. According to the derivation of Eq. (3), we obtain arelaxed versionoftheintractableEq.(2),expressedasfollows: min Inknowledge distillation, student models arecrafted using unlabeled datasets, where only thesoft targets from teachers are utilized.