Exploring Feature-based Knowledge Distillation for Recommender System: A Frequency Perspective

Zhu, Zhangchi, Zhang, Wei

arXiv.org Artificial Intelligence 

By defining To improve the inference efficiency without sacrificing accuracy, knowledge as different frequency components of the features, we many studies [10, 11, 13, 31] have adopted Knowledge Distillation theoretically demonstrate that regular feature-based knowledge distillation (KD) to recommender system. KD is a model-agnostic is equivalent to equally minimizing losses on all knowledge approach for model compression [6, 8]. In knowledge distillation and further analyze how this equal loss weight allocation method for recommendation, the common process is first to train a large leads to important knowledge being overlooked. In light of this, teacher model using the user-item interactions, then train a small we propose to emphasize important knowledge by redistributing student model using the user-item interactions as well as the features knowledge weights. Furthermore, we propose FreqD, a lightweight in the intermediate layer [10, 11, 13] and the predictions in knowledge reweighting method, to avoid the computational cost the output layer [1, 10, 15, 17] provided by the teacher model.