Tang, Ruiming
Dynamic Embedding Size Search with Minimum Regret for Streaming Recommender System
He, Bowei, He, Xu, Zhang, Renrui, Zhang, Yingxue, Tang, Ruiming, Ma, Chen
With the continuous increase of users and items, conventional recommender systems trained on static datasets can hardly adapt to changing environments. The high-throughput data requires the model to be updated in a timely manner for capturing the user interest dynamics, which leads to the emergence of streaming recommender systems. Due to the prevalence of deep learning-based recommender systems, the embedding layer is widely adopted to represent the characteristics of users, items, and other features in low-dimensional vectors. However, it has been proved that setting an identical and static embedding size is sub-optimal in terms of recommendation performance and memory cost, especially for streaming recommendations. To tackle this problem, we first rethink the streaming model update process and model the dynamic embedding size search as a bandit problem. Then, we analyze and quantify the factors that influence the optimal embedding sizes from the statistics perspective. Based on this, we propose the \textbf{D}ynamic \textbf{E}mbedding \textbf{S}ize \textbf{S}earch (\textbf{DESS}) method to minimize the embedding size selection regret on both user and item sides in a non-stationary manner. Theoretically, we obtain a sublinear regret upper bound superior to previous methods. Empirical results across two recommendation tasks on four public datasets also demonstrate that our approach can achieve better streaming recommendation performance with lower memory cost and higher time efficiency.
AutoAssign+: Automatic Shared Embedding Assignment in Streaming Recommendation
Liu, Ziru, Chen, Kecheng, Song, Fengyi, Chen, Bo, Zhao, Xiangyu, Guo, Huifeng, Tang, Ruiming
With the rapid growth of personalized online applications, recommender systems have been widely implemented by various online businesses, including E-commerce websites, news platforms, online advertising, and so on [1, 2]. Among them, streaming recommendation [3, 4] is one of the common forms of recommender systems, where streaming data are constantly flowing into the recommendation models for training, thus better modeling the user's current preferences. In addition, streaming recommendations are particularly important for time-sensitive items, such as news, as they allow for rapid identification and distribution of relevant content to interested users, which is critical for commercial information retrieval systems. Due to the ability to effectively capture the highly nonlinear relationship between user and item end-to-end, neural network-based models are rapidly becoming the mainstream of recommender systems. As shown in Figure 1, existing deep recommendation models typically follow the "Embedding & Feature Interaction" paradigm [5]. The embedding layer serves as the encoder to represent sparse features in dense latent space, while the feature interaction layers serve to capture interactive signals among these features. In a streaming recommender system, new items and users are continually added to the data corpus, creating a highly dynamic streaming environment that presents several challenges, which can be summarized as: Cold-start: The streaming recommender system is confronted with a constant influx of new users, many of whom can be classified as visitor-type users and possess extremely limited behavior information. Furthermore, the system is constantly updated with new items, yet there has not been enough interaction with these items to generate an adequate level of training data. The consequence of employing insufficiently trained new user/item embeddings is a significant decline in the performance of the recommendation model.
MAP: A Model-agnostic Pretraining Framework for Click-through Rate Prediction
Lin, Jianghao, Qu, Yanru, Guo, Wei, Dai, Xinyi, Tang, Ruiming, Yu, Yong, Zhang, Weinan
With the widespread application of personalized online services, click-through rate (CTR) prediction has received more and more attention and research. The most prominent features of CTR prediction are its multi-field categorical data format, and vast and daily-growing data volume. The large capacity of neural models helps digest such massive amounts of data under the supervised learning paradigm, yet they fail to utilize the substantial data to its full potential, since the 1-bit click signal is not sufficient to guide the model to learn capable representations of features and instances. The self-supervised learning paradigm provides a more promising pretrain-finetune solution to better exploit the large amount of user click logs, and learn more generalized and effective representations. However, self-supervised learning for CTR prediction is still an open question, since current works on this line are only preliminary and rudimentary. To this end, we propose a Model-agnostic pretraining (MAP) framework that applies feature corruption and recovery on multi-field categorical data, and more specifically, we derive two practical algorithms: masked feature prediction (MFP) and replaced feature detection (RFD). MFP digs into feature interactions within each instance through masking and predicting a small portion of input features, and introduces noise contrastive estimation (NCE) to handle large feature spaces. RFD further turns MFP into a binary classification mode through replacing and detecting changes in input features, making it even simpler and more effective for CTR pretraining. Our extensive experiments on two real-world large-scale datasets (i.e., Avazu, Criteo) demonstrate the advantages of these two methods on several strong backbones (e.g., DCNv2, DeepFM), and achieve new state-of-the-art performance in terms of both effectiveness and efficiency for CTR prediction.
How Can Recommender Systems Benefit from Large Language Models: A Survey
Lin, Jianghao, Dai, Xinyi, Xi, Yunjia, Liu, Weiwen, Chen, Bo, Li, Xiangyang, Zhu, Chenxu, Guo, Huifeng, Yu, Yong, Tang, Ruiming, Zhang, Weinan
Recommender systems (RS) play important roles to match users' information needs for Internet applications. In natural language processing (NLP) domains, large language model (LLM) has shown astonishing emergent abilities (e.g., instruction following, reasoning), thus giving rise to the promising research direction of adapting LLM to RS for performance enhancements and user experience improvements. In this paper, we conduct a comprehensive survey on this research direction from an application-oriented view. We first summarize existing research works from two orthogonal perspectives: where and how to adapt LLM to RS. For the "WHERE" question, we discuss the roles that LLM could play in different stages of the recommendation pipeline, i.e., feature engineering, feature encoder, scoring/ranking function, and pipeline controller. For the "HOW" question, we investigate the training and inference strategies, resulting in two fine-grained taxonomy criteria, i.e., whether to tune LLMs or not, and whether to involve conventional recommendation model (CRM) for inference. Detailed analysis and general development trajectories are provided for both questions, respectively. Then, we highlight key challenges in adapting LLM to RS from three aspects, i.e., efficiency, effectiveness, and ethics. Finally, we summarize the survey and discuss the future prospects. We also actively maintain a GitHub repository for papers and other related resources in this rising direction: https://github.com/CHIANGEL/Awesome-LLM-for-RecSys.
Contrastive Multi-view Framework for Customer Lifetime Value Prediction
Wu, Chuhan, Li, Jingjie, Jia, Qinglin, Zhu, Hong, Fang, Yuan, Tang, Ruiming
Accurate customer lifetime value (LTV) prediction can help service providers optimize their marketing policies in customer-centric applications. However, the heavy sparsity of consumption events and the interference of data variance and noise obstruct LTV estimation. Many existing LTV prediction methods directly train a single-view LTV predictor on consumption samples, which may yield inaccurate and even biased knowledge extraction. In this paper, we propose a contrastive multi-view framework for LTV prediction, which is a plug-and-play solution compatible with various backbone models. It synthesizes multiple heterogeneous LTV regressors with complementary knowledge to improve model robustness and captures sample relatedness via contrastive learning to mitigate the dependency on data abundance. Concretely, we use a decomposed scheme that converts the LTV prediction problem into a combination of estimating consumption probability and payment amount. To alleviate the impact of noisy data on model learning, we propose a multi-view framework that jointly optimizes multiple types of regressors with diverse characteristics and advantages to encode and fuse comprehensive knowledge. To fully exploit the potential of limited training samples, we propose a hybrid contrastive learning method to help capture the relatedness between samples in both classification and regression tasks. We conduct extensive experiments on a real-world game LTV prediction dataset and the results validate the effectiveness of our method. We have deployed our solution online in Huawei's mobile game center and achieved 32.26% of total payment amount gains.
Compressed Interaction Graph based Framework for Multi-behavior Recommendation
Guo, Wei, Meng, Chang, Yuan, Enming, He, Zhicheng, Guo, Huifeng, Zhang, Yingxue, Chen, Bo, Hu, Yaochen, Tang, Ruiming, Li, Xiu, Zhang, Rui
Multi-types of user behavior data (e.g., clicking, adding to cart, and purchasing) are recorded in most real-world recommendation scenarios, which can help to learn users' multi-faceted preferences. However, it is challenging to explore multi-behavior data due to the unbalanced data distribution and sparse target behavior, which lead to the inadequate modeling of high-order relations when treating multi-behavior data ''as features'' and gradient conflict in multitask learning when treating multi-behavior data ''as labels''. In this paper, we propose CIGF, a Compressed Interaction Graph based Framework, to overcome the above limitations. Specifically, we design a novel Compressed Interaction Graph Convolution Network (CIGCN) to model instance-level high-order relations explicitly. To alleviate the potential gradient conflict when treating multi-behavior data ''as labels'', we propose a Multi-Expert with Separate Input (MESI) network with separate input on the top of CIGCN for multi-task learning. Comprehensive experiments on three large-scale real-world datasets demonstrate the superiority of CIGF. Ablation studies and in-depth analysis further validate the effectiveness of our proposed model in capturing high-order relations and alleviating gradient conflict. The source code and datasets are available at https://github.com/MC-CV/CIGF.
A Survey on User Behavior Modeling in Recommender Systems
He, Zhicheng, Liu, Weiwen, Guo, Wei, Qin, Jiarui, Zhang, Yingxue, Hu, Yaochen, Tang, Ruiming
User Behavior Modeling (UBM) plays a critical role in user interest learning, which has been extensively used in recommender systems. Crucial interactive patterns between users and items have been exploited, which brings compelling improvements in many recommendation tasks. In this paper, we attempt to provide a thorough survey of this research topic. We start by reviewing the research background of UBM. Then, we provide a systematic taxonomy of existing UBM research works, which can be categorized into four different directions including Conventional UBM, Long-Sequence UBM, Multi-Type UBM, and UBM with Side Information. Within each direction, representative models and their strengths and weaknesses are comprehensively discussed. Besides, we elaborate on the industrial practices of UBM methods with the hope of providing insights into the application value of existing UBM solutions. Finally, we summarize the survey and discuss the future prospects of this field.
A Comprehensive Survey on Automated Machine Learning for Recommendations
Chen, Bo, Zhao, Xiangyu, Wang, Yejing, Fan, Wenqi, Guo, Huifeng, Tang, Ruiming
Deep recommender systems (DRS) are critical for current commercial online service providers, which address the issue of information overload by recommending items that are tailored to the user's interests and preferences. They have unprecedented feature representations effectiveness and the capacity of modeling the non-linear relationships between users and items. Despite their advancements, DRS models, like other deep learning models, employ sophisticated neural network architectures and other vital components that are typically designed and tuned by human experts. This article will give a comprehensive summary of automated machine learning (AutoML) for developing DRS models. We first provide an overview of AutoML for DRS models and the related techniques. Then we discuss the state-of-the-art AutoML approaches that automate the feature selection, feature embeddings, feature interactions, and model training in DRS. We point out that the existing AutoML-based recommender systems are developing to a multi-component joint search with abstract search space and efficient search algorithm. Finally, we discuss appealing research directions and summarize the survey.
Adapting Triplet Importance of Implicit Feedback for Personalized Recommendation
Wu, Haolun, Ma, Chen, Zhang, Yingxue, Liu, Xue, Tang, Ruiming, Coates, Mark
Implicit feedback is frequently used for developing personalized recommendation services due to its ubiquity and accessibility in real-world systems. In order to effectively utilize such information, most research adopts the pairwise ranking method on constructed training triplets (user, positive item, negative item) and aims to distinguish between positive items and negative items for each user. However, most of these methods treat all the training triplets equally, which ignores the subtle difference between different positive or negative items. On the other hand, even though some other works make use of the auxiliary information (e.g., dwell time) of user behaviors to capture this subtle difference, such auxiliary information is hard to obtain. To mitigate the aforementioned problems, we propose a novel training framework named Triplet Importance Learning (TIL), which adaptively learns the importance score of training triplets. We devise two strategies for the importance score generation and formulate the whole procedure as a bilevel optimization, which does not require any rule-based design. We integrate the proposed training procedure with several Matrix Factorization (MF)- and Graph Neural Network (GNN)-based recommendation models, demonstrating the compatibility of our framework. Via a comparison using three real-world datasets with many state-of-the-art methods, we show that our proposed method outperforms the best existing models by 3-21\% in terms of Recall@k for the top-k recommendation.
Adaptive Low-Precision Training for Embeddings in Click-Through Rate Prediction
Li, Shiwei, Guo, Huifeng, Hou, Lu, Zhang, Wei, Tang, Xing, Tang, Ruiming, Zhang, Rui, Li, Ruixuan
Embedding tables are usually huge in click-through rate (CTR) prediction models. To train and deploy the CTR models efficiently and economically, it is necessary to compress their embedding tables at the training stage. To this end, we formulate a novel quantization training paradigm to compress the embeddings from the training stage, termed low-precision training (LPT). Also, we provide theoretical analysis on its convergence. The results show that stochastic weight quantization has a faster convergence rate and a smaller convergence error than deterministic weight quantization in LPT. Further, to reduce the accuracy degradation, we propose adaptive low-precision training (ALPT) that learns the step size (i.e., the quantization resolution) through gradient descent. Experiments on two real-world datasets confirm our analysis and show that ALPT can significantly improve the prediction accuracy, especially at extremely low bit widths. For the first time in CTR models, we successfully train 8-bit embeddings without sacrificing prediction accuracy. The code of ALPT is publicly available.