Goto

Collaborating Authors

 interactive recommendation


Revisiting Fairness-aware Interactive Recommendation: Item Lifecycle as a Control Knob

Lu, Yun, Shi, Xiaoyu, Xie, Hong, Xia, Chongjun, Gong, Zhenhui, Shang, Mingsheng

arXiv.org Artificial Intelligence

This paper revisits fairness-aware interactive recommendation (e.g., TikTok, KuaiShou) by introducing a novel control knob, i.e., the lifecycle of items. We make threefold contributions. First, we conduct a comprehensive empirical analysis and uncover that item lifecycles in short-video platforms follow a compressed three-phase pattern, i.e., rapid growth, transient stability, and sharp decay, which significantly deviates from the classical four-stage model (introduction, growth, maturity, decline). Second, we introduce LHRL, a lifecycle-aware hierarchical reinforcement learning framework that dynamically harmonizes fairness and accuracy by leveraging phase-specific exposure dynamics. LHRL consists of two key components: (1) PhaseFormer, a lightweight encoder combining STL decomposition and attention mechanisms for robust phase detection; (2) a two-level HRL agent, where the high-level policy imposes phase-aware fairness constraints, and the low-level policy optimizes immediate user engagement. This decoupled optimization allows for effective reconciliation between long-term equity and short-term utility. Third, experiments on multiple real-world interactive recommendation datasets demonstrate that LHRL significantly improves both fairness and user engagement. Furthermore, the integration of lifecycle-aware rewards into existing RL-based models consistently yields performance gains, highlighting the generalizability and practical value of our approach.


52130c418d4f02c74f74a5bc1f8020b2-AuthorFeedback.pdf

Neural Information Processing Systems

We thank all the reviewers for their positive comments, and address their major questions and comments below. Clarifications will be added in the revision and we will keep improving our draft. Reviewer #1 We thank the reviewer for the positive reviews. The remarks raised are addressed below. We are happy to release our code for better reproducibility.


52130c418d4f02c74f74a5bc1f8020b2-AuthorFeedback.pdf

Neural Information Processing Systems

We thank all the reviewers for their positive comments, and address their major questions and comments below. Clarifications will be added in the revision and we will keep improving our draft. Reviewer #1 We thank the reviewer for the positive reviews. The remarks raised are addressed below. We are happy to release our code for better reproducibility.


Reviews: Text-Based Interactive Recommendation via Constraint-Augmented Reinforcement Learning

Neural Information Processing Systems

Eq. (3), Eq. (5) and its model details) is consistent with the target task. The reward and constraints are reasonably designed. The experimental setting is remarkable (especially the Online Evaluation by simulator and the four proposed evaluation metrics) and the results are positive. However, this paper still has the following minor issues.


Debiased Model-based Interactive Recommendation

Li, Zijian, Cai, Ruichu, Huang, Haiqin, Zhang, Sili, Yan, Yuguang, Hao, Zhifeng, Dong, Zhenghua

arXiv.org Artificial Intelligence

Existing model-based interactive recommendation systems are trained by querying a world model to capture the user preference, but learning the world model from historical logged data will easily suffer from bias issues such as popularity bias and sampling bias. This is why some debiased methods have been proposed recently. However, two essential drawbacks still remain: 1) ignoring the dynamics of the time-varying popularity results in a false reweighting of items. 2) taking the unknown samples as negative samples in negative sampling results in the sampling bias. To overcome these two drawbacks, we develop a model called \textbf{i}dentifiable \textbf{D}ebiased \textbf{M}odel-based \textbf{I}nteractive \textbf{R}ecommendation (\textbf{iDMIR} in short). In iDMIR, for the first drawback, we devise a debiased causal world model based on the causal mechanism of the time-varying recommendation generation process with identification guarantees; for the second drawback, we devise a debiased contrastive policy, which coincides with the debiased contrastive learning and avoids sampling bias. Moreover, we demonstrate that the proposed method not only outperforms several latest interactive recommendation algorithms but also enjoys diverse recommendation performance.


Knowledge-guided Deep Reinforcement Learning for Interactive Recommendation

Chen, Xiaocong, Huang, Chaoran, Yao, Lina, Wang, Xianzhi, Liu, Wei, Zhang, Wenjie

arXiv.org Machine Learning

Interactive recommendation aims to learn from dynamic interactions between items and users to achieve responsiveness and accuracy. Reinforcement learning is inherently advantageous for coping with dynamic environments and thus has attracted increasing attention in interactive recommendation research. Inspired by knowledge-aware recommendation, we proposed Knowledge-Guided deep Reinforcement learning (KGRL) to harness the advantages of both reinforcement learning and knowledge graphs for interactive recommendation. This model is implemented upon the actor-critic network framework. It maintains a local knowledge network to guide decision-making and employs the attention mechanism to capture long-term semantics between items. We have conducted comprehensive experiments in a simulated online environment with six public real-world datasets and demonstrated the superiority of our model over several state-of-the-art methods.


Factorization Bandits for Interactive Recommendation

Wang, Huazheng (University of Virginia) | Wu, Qingyun (University of Virginia) | Wang, Hongning (University of Virginia)

AAAI Conferences

We perform online interactive recommendation via a factorization-based bandit algorithm. Low-rank matrix completion is performed over an incrementally constructed user-item preference matrix, where an upper confidence bound based item selection strategy is developed to balance the exploit/explore trade-off during online learning. Observable contextual features and dependency among users (e.g., social influence) are leveraged to improve the algorithm's convergence rate and help conquer cold-start in recommendation. A high probability sublinear upper regret bound is proved for the developed algorithm, where considerable regret reduction is achieved on both user and item sides. Extensive experimentations on both simulations and large-scale real-world datasets confirmed the advantages of the proposed algorithm compared with several state-of-the-art factorization-based and bandit-based collaborative filtering methods.