Personal Assistant Systems
Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis
Anelli, Vito Walter, Malitesta, Daniele, Pomo, Claudio, Bellogรญn, Alejandro, Di Noia, Tommaso, Di Sciascio, Eugenio
These groundbreaking models are designed to represent users and items as a bipartite, undirected graph, unlocking a whole new level of high-order relationships that were previously almost unattainable. Not only they do achieve better accuracy than their predecessors, but they are also setting a new standard for modern recommender systems [20, 28, 47, 79]. In recent years, great effort has been devoted in creating GNN-based models that address the critical issues of existing models, such as the over-smoothing phenomenon [12] and scalability issues [87]. These cutting-edge models are taking the world of recommender systems by storm and ushering in a new era of accuracy [41, 47, 51, 59, 81]. Over the past ten years, the application of neural techniques rooted in graph representation learning, such as graph convolutional networks [35] (GCNs), has introduced a fresh perspective on traditional collaborative filtering (CF) approaches. Rather than relying solely on user-item interactions for optimization [29, 36, 55], GCN-based methods enable the extraction of both short-and long-distance user preferences toward items [71]. By incorporating multi-hop relationships into the embeddings of users and items, these learned profiles yield more precise recommendations, as evidenced in the literature [28, 47].
Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning
Xu, Ruiyang, Bhandari, Jalaj, Korenkevych, Dmytro, Liu, Fan, He, Yuchen, Nikulkov, Alex, Zhu, Zheqing
Auction-based recommender systems are prevalent in online advertising platforms, but they are typically optimized to allocate recommendation slots based on immediate expected return metrics, neglecting the downstream effects of recommendations on user behavior. In this study, we employ reinforcement learning to optimize for long-term return metrics in an auction-based recommender system. Utilizing temporal difference learning, a fundamental reinforcement learning algorithm, we implement an one-step policy improvement approach that biases the system towards recommendations with higher long-term user engagement metrics. This optimizes value over long horizons while maintaining compatibility with the auction framework. Our approach is grounded in dynamic programming ideas which show that our method provably improves upon the existing auction-based base policy. Through an online A/B test conducted on an auction-based recommender system which handles billions of impressions and users daily, we empirically establish that our proposed method outperforms the current production system in terms of long-term user engagement metrics.
Evaluating Online Bandit Exploration In Large-Scale Recommender System
Guo, Hongbo, Naeff, Ruben, Nikulkov, Alex, Zhu, Zheqing
Bandit learning has been an increasingly popular design choice for recommender system. Despite the strong interest in bandit learning from the community, there remains multiple bottlenecks that prevent many bandit learning approaches from productionalization. One major bottleneck is how to test the effectiveness of bandit algorithm with fairness and without data leakage. Different from supervised learning algorithms, bandit learning algorithms emphasize greatly on the data collection process through their explorative nature. Such explorative behavior may induce unfair evaluation in a classic A/B test setting. In this work, we apply upper confidence bound (UCB) to our large scale short video recommender system and present a test framework for the production bandit learning life-cycle with a new set of metrics. Extensive experiment results show that our experiment design is able to fairly evaluate the performance of bandit learning in the recommender system.
Online Matching: A Real-time Bandit System for Large-scale Recommendations
Yi, Xinyang, Wang, Shao-Chuan, He, Ruining, Chandrasekaran, Hariharan, Wu, Charles, Heldt, Lukasz, Hong, Lichan, Chen, Minmin, Chi, Ed H.
The last decade has witnessed many successes of deep learning-based models for industry-scale recommender systems. These models are typically trained offline in a batch manner. While being effective in capturing users' past interactions with recommendation platforms, batch learning suffers from long model-update latency and is vulnerable to system biases, making it hard to adapt to distribution shift and explore new items or user interests. Although online learning-based approaches (e.g., multi-armed bandits) have demonstrated promising theoretical results in tackling these challenges, their practical real-time implementation in large-scale recommender systems remains limited. First, the scalability of online approaches in servicing a massive online traffic while ensuring timely updates of bandit parameters poses a significant challenge. Additionally, exploring uncertainty in recommender systems can easily result in unfavorable user experience, highlighting the need for devising intricate strategies that effectively balance the trade-off between exploitation and exploration. In this paper, we introduce Online Matching: a scalable closed-loop bandit system learning from users' direct feedback on items in real time. We present a hybrid "offline + online" approach for constructing this system, accompanied by a comprehensive exposition of the end-to-end system architecture. We propose Diag-LinUCB -- a novel extension of the LinUCB algorithm -- to enable distributed updates of bandits parameter in a scalable and timely manner. We conduct live experiments in YouTube and show that Online Matching is able to enhance the capabilities of fresh content discovery and item exploration in the present platform.
Recommendation Unlearning via Matrix Correction
Liu, Jiahao, Li, Dongsheng, Gu, Hansu, Lu, Tun, Wu, Jiongran, Zhang, Peng, Shang, Li, Gu, Ning
Recommender systems are important for providing personalized services to users, but the vast amount of collected user data has raised concerns about privacy (e.g., sensitive data), security (e.g., malicious data) and utility (e.g., toxic data). To address these challenges, recommendation unlearning has emerged as a promising approach, which allows specific data and models to be forgotten, mitigating the risks of sensitive/malicious/toxic user data. However, existing methods often struggle to balance completeness, utility, and efficiency, i.e., compromising one for the other, leading to suboptimal recommendation unlearning. In this paper, we propose an Interaction and Mapping Matrices Correction (IMCorrect) method for recommendation unlearning. Firstly, we reveal that many collaborative filtering (CF) algorithms can be formulated as mapping-based approach, in which the recommendation results can be obtained by multiplying the user-item interaction matrix with a mapping matrix. Then, IMCorrect can achieve efficient recommendation unlearning by correcting the interaction matrix and enhance the completeness and utility by correcting the mapping matrix, all without costly model retraining. Unlike existing methods, IMCorrect is a whitebox model that offers greater flexibility in handling various recommendation unlearning scenarios. Additionally, it has the unique capability of incrementally learning from new data, which further enhances its practicality. We conducted comprehensive experiments to validate the effectiveness of IMCorrect and the results demonstrate that IMCorrect is superior in completeness, utility, and efficiency, and is applicable in many recommendation unlearning scenarios.
ESMC: Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint
Jiang, Zhenhao, Zeng, Biao, Feng, Hao, Liu, Jin, Fan, Jicong, Zhang, Jie, Jia, Jia, Hu, Ning, Chen, Xingyu, Lan, Xuguang
Large-scale online recommender system spreads all over the Internet being in charge of two basic tasks: Click-Through Rate (CTR) and Post-Click Conversion Rate (CVR) estimations. However, traditional CVR estimators suffer from well-known Sample Selection Bias and Data Sparsity issues. Entire space models were proposed to address the two issues via tracing the decision-making path of "exposure_click_purchase". Further, some researchers observed that there are purchase-related behaviors between click and purchase, which can better draw the user's decision-making intention and improve the recommendation performance. Thus, the decision-making path has been extended to "exposure_click_in-shop action_purchase" and can be modeled with conditional probability approach. Nevertheless, we observe that the chain rule of conditional probability does not always hold. We report Probability Space Confusion (PSC) issue and give a derivation of difference between ground-truth and estimation mathematically. We propose a novel Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint (ESMC) and two alternatives: Entire Space Multi-Task Model with Siamese Network (ESMS) and Entire Space Multi-Task Model in Global Domain (ESMG) to address the PSC issue. Specifically, we handle "exposure_click_in-shop action" and "in-shop action_purchase" separately in the light of characteristics of in-shop action. The first path is still treated with conditional probability while the second one is treated with parameter constraint strategy. Experiments on both offline and online environments in a large-scale recommendation system illustrate the superiority of our proposed methods over state-of-the-art models. The real-world datasets will be released.
FeedbackLogs: Recording and Incorporating Stakeholder Feedback into Machine Learning Pipelines
Barker, Matthew, Kallina, Emma, Ashok, Dhananjay, Collins, Katherine M., Casovan, Ashley, Weller, Adrian, Talwalkar, Ameet, Chen, Valerie, Bhatt, Umang
Even though machine learning (ML) pipelines affect an increasing array of stakeholders, there is little work on how input from stakeholders is recorded and incorporated. We propose FeedbackLogs, addenda to existing documentation of ML pipelines, to track the input of multiple stakeholders. Each log records important details about the feedback collection process, the feedback itself, and how the feedback is used to update the ML pipeline. In this paper, we introduce and formalise a process for collecting a FeedbackLog. We also provide concrete use cases where FeedbackLogs can be employed as evidence for algorithmic auditing and as a tool to record updates based on stakeholder feedback.
'Far better than Apple's offering': Get Alexa in your ears and score Amazon's Echo Buds with noise cancellation on sale for less than ยฃ60 (that's 58% off!)
And anything that stops us from forgetting something on our shopping list has got to be a winner! Another great thing about the earbuds is that they have a noise-cancelling and sealed design. That means you can block out all of the background noise that can stop you from enjoying your music to the max.
Rethinking Missing Data: Aleatoric Uncertainty-Aware Recommendation
Wang, Chenxu, Feng, Fuli, Zhang, Yang, Wang, Qifan, Hu, Xunhan, He, Xiangnan
Historical interactions are the default choice for recommender model training, which typically exhibit high sparsity, i.e., most user-item pairs are unobserved missing data. A standard choice is treating the missing data as negative training samples and estimating interaction likelihood between user-item pairs along with the observed interactions. In this way, some potential interactions are inevitably mislabeled during training, which will hurt the model fidelity, hindering the model to recall the mislabeled items, especially the long-tail ones. In this work, we investigate the mislabeling issue from a new perspective of aleatoric uncertainty, which describes the inherent randomness of missing data. The randomness pushes us to go beyond merely the interaction likelihood and embrace aleatoric uncertainty modeling. Towards this end, we propose a new Aleatoric Uncertainty-aware Recommendation (AUR) framework that consists of a new uncertainty estimator along with a normal recommender model. According to the theory of aleatoric uncertainty, we derive a new recommendation objective to learn the estimator. As the chance of mislabeling reflects the potential of a pair, AUR makes recommendations according to the uncertainty, which is demonstrated to improve the recommendation performance of less popular items without sacrificing the overall performance. We instantiate AUR on three representative recommender models: Matrix Factorization (MF), LightGCN, and VAE from mainstream model architectures. Extensive results on two real-world datasets validate the effectiveness of AUR w.r.t. better recommendation results, especially on long-tail items.
Adversarial Sleeping Bandit Problems with Multiple Plays: Algorithm and Ranking Application
Yuan, Jianjun, Woon, Wei Lee, Coba, Ludovik
This paper presents an efficient algorithm to solve the sleeping bandit with multiple plays problem in the context of an online recommendation system. The problem involves bounded, adversarial loss and unknown i.i.d. distributions for arm availability. The proposed algorithm extends the sleeping bandit algorithm for single arm selection and is guaranteed to achieve theoretical performance with regret upper bounded by $\bigO(kN^2\sqrt{T\log T})$, where $k$ is the number of arms selected per time step, $N$ is the total number of arms, and $T$ is the time horizon.