BPL: Bias-adaptive Preference Distillation Learning for Recommender System
Kang, SeongKu, Lian, Jianxun, Lee, Dongha, Kweon, Wonbin, Jang, Sanghwan, Lee, Jaehyun, Wang, Jindong, Xie, Xing, Yu, Hwanjo
–arXiv.org Artificial Intelligence
Abstract--Recommender systems suffer from biases that cause the collected feedback to incompletely reveal user preference. While debiasing learning has been extensively studied, they mostly focused on the specialized (called counterfactual) test environment simulated by random exposure of items, significantly degrading accuracy in the typical (called factual) test environment based on actual user-item interactions. In fact, each test environment highlights the benefit of a different aspect: the counterfactual test emphasizes user satisfaction in the long-terms, while the factual test focuses on predicting subsequent user behaviors on platforms. Therefore, it is desirable to have a model that performs well on both tests rather than only one. In this work, we introduce a new learning framework, called Bias-adaptive Preference distillation Learning (BPL), to gradually uncover user preferences with dual distillation strategies. These distillation strategies are designed to drive high performance in both factual and counterfactual test environments. Employing a specialized form of teacher-student distillation from a biased model, BPL retains accurate preference knowledge aligned with the collected feedback, leading to high performance in the factual test. This enables the model to produce more accurate predictions across a broader range of user-item combinations, thereby improving performance in the counterfactual test. Real-world recommender systems form a feedback loop in which the systems' recommendations influence user behaviors, which in turn serve as training data for the system [1]. This feedback loop leads to the creation and amplification of various biases affected by multiple factors, including but not limited to user selection patterns, item exposure mechanism, and influence of public opinions [2], [3]. These biases progressively cause the training data to deviate from users' true preference, ultimately degrading the user satisfaction. SeongKu Kang is with the Department of Computer Science and Engineering, Korea University, Seoul, South Korea. Dongha Lee is with the Department of Aritifial Intelligence, Y onsei University, Seoul, South Korea, E-mail:donalee@yonsei.ac.kr. Jindong Wang is with William & Mary, Virginia, United States.
arXiv.org Artificial Intelligence
Oct-21-2025
- Country:
- North America > United States
- Virginia (0.24)
- Asia > South Korea
- North America > United States
- Genre:
- Research Report
- Experimental Study (0.68)
- New Finding (0.46)
- Research Report
- Technology: