implicit feedback
Counterfactual Implicit Feedback Modeling
In recommendation systems, implicit feedback data can be automatically recorded and is more common than explicit feedback data. However, implicit feedback poses two challenges for relevance prediction, namely (a) positive-unlabeled (PU): negative feedback does not necessarily imply low relevance and (b) missing not at random (MNAR): items that are popular or frequently recommended tend to receive more clicks than other items, even if the user does not have a significant interest in them. Existing methods either overlook the MNAR issue or fail to account for the inherent mechanism of the PU issue. As a result, they may lead to inaccurate relevance predictions or inflated biases and variances. In this paper, we formulate the implicit feedback problem as a counterfactual estimation problem with missing treatment variables.
Counterfactual Implicit Feedback Modeling
In recommendation systems, implicit feedback data can be automatically recorded and is more common than explicit feedback data. However, implicit feedback poses two challenges for relevance prediction, namely (a) positive-unlabeled (PU): negative feedback does not necessarily imply low relevance and (b) missing not at random (MNAR): items that are popular or frequently recommended tend to receive more clicks than other items, even if the user does not have a significant interest in them. Existing methods either overlook the MNAR issue or fail to account for the inherent mechanism of the PU issue. As a result, they may lead to inaccurate relevance predictions or inflated biases and variances. In this paper, we formulate the implicit feedback problem as a counterfactual estimation problem with missing treatment variables.
Bandit Learning with Implicit Feedback
Implicit feedback, such as user clicks, although abundant in online information service systems, does not provide substantial evidence on users' evaluation of system's output. Without proper modeling, such incomplete supervision inevitably misleads model estimation, especially in a bandit learning setting where the feedback is acquired on the fly. In this work, we perform contextual bandit learning with implicit feedback by modeling the feedback as a composition of user result examination and relevance judgment. Since users' examination behavior is unobserved, we introduce latent variables to model it. We perform Thompson sampling on top of variational Bayesian inference for arm selection and model update. Our upper regret bound analysis of the proposed algorithm proves its feasibility of learning from implicit feedback in a bandit setting; and extensive empirical evaluations on click logs collected from a major MOOC platform further demonstrate its learning effectiveness in practice.
Modeling Dynamic Missingness of Implicit Feedback for Recommendation
Menghan Wang, Mingming Gong, Xiaolin Zheng, Kun Zhang
Collaborative filtering methods based on implicit feedback (e.g., purchase records and browsing history) are widely used in recommender systems. Compared to explicit feedback (e.g., 1-5 star ratings), implicit feedback is more abundant and accessible in real-world applications. However, the missing data of implicit feedback also brings two challenges.
G-UBS: Towards Robust Understanding of Implicit Feedback via Group-Aware User Behavior Simulation
Chen, Boyu, Chen, Siran, Yue, Zhengrong, Yan, Kainan, Yu, Chenyun, Kong, Beibei, Lei, Cheng, Zhuo, Chengxiang, Li, Zang, Wang, Yali
User feedback is critical for refining recommendation systems, yet explicit feedback (e.g., likes or dislikes) remains scarce in practice. As a more feasible alternative, inferring user preferences from massive implicit feedback has shown great potential (e.g., a user quickly skipping a recommended video usually indicates disinterest). Unfortunately, implicit feedback is often noisy: a user might skip a video due to accidental clicks or other reasons, rather than disliking it. Such noise can easily misjudge user interests, thereby undermining recommendation performance. To address this issue, we propose a novel Group-aware User Behavior Simulation (G-UBS) paradigm, which leverages contextual guidance from relevant user groups, enabling robust and in-depth interpretation of implicit feedback for individual users. Specifically, G-UBS operates via two key agents. First, the User Group Manager (UGM) effectively clusters users to generate group profiles utilizing a ``summarize-cluster-reflect" workflow based on LLMs. Second, the User Feedback Modeler (UFM) employs an innovative group-aware reinforcement learning approach, where each user is guided by the associated group profiles during the reinforcement learning process, allowing UFM to robustly and deeply examine the reasons behind implicit feedback. To assess our G-UBS paradigm, we have constructed a Video Recommendation benchmark with Implicit Feedback (IF-VR). To the best of our knowledge, this is the first multi-modal benchmark for implicit feedback evaluation in video recommendation, encompassing 15k users, 25k videos, and 933k interaction records with implicit feedback. Extensive experiments on IF-VR demonstrate that G-UBS significantly outperforms mainstream LLMs and MLLMs, with a 4.0% higher proportion of videos achieving a play rate > 30% and 14.9% higher reasoning accuracy on IF-VR.
Generalization Bounds for Semi-supervised Matrix Completion with Distributional Side Information
Ledent, Antoine, Soo, Mun Chong, Hieu, Nong Minh
We study a matrix completion problem where both the ground truth $R$ matrix and the unknown sampling distribution $P$ over observed entries are low-rank matrices, and \textit{share a common subspace}. We assume that a large amount $M$ of \textit{unlabeled} data drawn from the sampling distribution $P$ is available, together with a small amount $N$ of labeled data drawn from the same distribution and noisy estimates of the corresponding ground truth entries. This setting is inspired by recommender systems scenarios where the unlabeled data corresponds to `implicit feedback' (consisting in interactions such as purchase, click, etc. ) and the labeled data corresponds to the `explicit feedback', consisting of interactions where the user has given an explicit rating to the item. Leveraging powerful results from the theory of low-rank subspace recovery, together with classic generalization bounds for matrix completion models, we show error bounds consisting of a sum of two error terms scaling as $\widetilde{O}\left(\sqrt{\frac{nd}{M}}\right)$ and $\widetilde{O}\left(\sqrt{\frac{dr}{N}}\right)$ respectively, where $d$ is the rank of $P$ and $r$ is the rank of $M$. In synthetic experiments, we confirm that the true generalization error naturally splits into independent error terms corresponding to the estimations of $P$ and and the ground truth matrix $\ground$ respectively. In real-life experiments on Douban and MovieLens with most explicit ratings removed, we demonstrate that the method can outperform baselines relying only on the explicit ratings, demonstrating that our assumptions provide a valid toy theoretical setting to study the interaction between explicit and implicit feedbacks in recommender systems.
Bandit Learning with Implicit Feedback
Implicit feedback, such as user clicks, although abundant in online information service systems, does not provide substantial evidence on users' evaluation of system's output. Without proper modeling, such incomplete supervision inevitably misleads model estimation, especially in a bandit learning setting where the feedback is acquired on the fly. In this work, we perform contextual bandit learning with implicit feedback by modeling the feedback as a composition of user result examination and relevance judgment. Since users' examination behavior is unobserved, we introduce latent variables to model it. We perform Thompson sampling on top of variational Bayesian inference for arm selection and model update. Our upper regret bound analysis of the proposed algorithm proves its feasibility of learning from implicit feedback in a bandit setting; and extensive empirical evaluations on click logs collected from a major MOOC platform further demonstrate its learning effectiveness in practice.