What Knowledge Gets Distilled in Knowledge Distillation? Utkarsh Ojha Yuheng Li Anirudh Sundara Rajan Yingyu Liang Yong Jae Lee University of Wisconsin-Madison

Neural Information Processing Systems 

Knowledge distillation aims to transfer useful information from a teacher network to a student network, with the primary goal of improving the student's performance for the task at hand. Over the years, there has a been a deluge of novel techniques and use cases of knowledge distillation. Yet, despite the various improvements, there seems to be a glaring gap in the community's fundamental understanding of the process. Specifically, what is the knowledge that gets distilled in knowledge distillation? In other words, in what ways does the student become similar to the teacher?

Similar Docs  Excel Report  more

TitleSimilaritySource
None found