Decoupled Kullback-Leibler Divergence Loss

Neural Information Processing Systems 

Firstly, we address the limitation of KL/DKL in scenarios like knowledge distillation by breaking its asymmetric optimization property.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found