Exploring Feature-based Knowledge Distillation for Recommender System: A Frequency Perspective
–arXiv.org Artificial Intelligence
By defining To improve the inference efficiency without sacrificing accuracy, knowledge as different frequency components of the features, we many studies [10, 11, 13, 31] have adopted Knowledge Distillation theoretically demonstrate that regular feature-based knowledge distillation (KD) to recommender system. KD is a model-agnostic is equivalent to equally minimizing losses on all knowledge approach for model compression [6, 8]. In knowledge distillation and further analyze how this equal loss weight allocation method for recommendation, the common process is first to train a large leads to important knowledge being overlooked. In light of this, teacher model using the user-item interactions, then train a small we propose to emphasize important knowledge by redistributing student model using the user-item interactions as well as the features knowledge weights. Furthermore, we propose FreqD, a lightweight in the intermediate layer [10, 11, 13] and the predictions in knowledge reweighting method, to avoid the computational cost the output layer [1, 10, 15, 17] provided by the teacher model.
arXiv.org Artificial Intelligence
Jan-13-2025
- Country:
- North America > Canada (0.16)
- Asia > China (0.14)
- Genre:
- Research Report
- New Finding (0.67)
- Experimental Study (0.46)
- Research Report
- Industry:
- Education (1.00)
- Technology: