query user
Robust Training Objectives Improve Embedding-based Retrieval in Industrial Recommendation Systems
Kolodner, Matthew, Ju, Mingxuan, Fan, Zihao, Zhao, Tong, Ghazizadeh, Elham, Wu, Yan, Shah, Neil, Liu, Yozen
Improving recommendation systems (RS) can greatly enhance the user experience across many domains, such as social media. Many RS utilize embedding-based retrieval (EBR) approaches to retrieve candidates for recommendation. In an EBR system, the embedding quality is key. According to recent literature, self-supervised multitask learning (SSMTL) has showed strong performance on academic benchmarks in embedding learning and resulted in an overall improvement in multiple downstream tasks, demonstrating a larger resilience to the adverse conditions between each downstream task and thereby increased robustness and task generalization ability through the training objective. However, whether or not the success of SSMTL in academia as a robust training objectives translates to large-scale (i.e., over hundreds of million users and interactions in-between) industrial RS still requires verification. Simply adopting academic setups in industrial RS might entail two issues. Firstly, many self-supervised objectives require data augmentations (e.g., embedding masking/corruption) over a large portion of users and items, which is prohibitively expensive in industrial RS. Furthermore, some self-supervised objectives might not align with the recommendation task, which might lead to redundant computational overheads or negative transfer. In light of these two challenges, we evaluate using a robust training objective, specifically SSMTL, through a large-scale friend recommendation system on a social media platform in the tech sector, identifying whether this increase in robustness can work at scale in enhancing retrieval in the production setting. Through online A/B testing with SSMTL-based EBR, we observe statistically significant increases in key metrics in the friend recommendations, with up to 5.45% improvements in new friends made and 1.91% improvements in new friends made with cold-start users.
Spatio-Temporal Signatures of User-Centric Data: How Similar Are We?
Shukla, Samta (Rensselaer Polytechnic Institute) | Telang, Aditya (IBM Reasearch, India) | Joshi, Salil (IBM Reasearch, India) | Subramaniam, L. Venkat (IBM Reasearch, India)
Much work has been done on understanding and predicting human mobility in time. In this work, we are interested in obtaining a set of users who are spatio-temporally most similar to a query user. We propose an efficient way of user data representation called Spatio-Temporal Signatures to keep track of complete record of user movement. We define a measure called Spatio-Temporal similarity for comparing a given pair of users. Although computing exact pairwise Spatio-Temporal similarities between query user with all users is inefficient, we show that with our hybrid pruning scheme the most similar users can be obtained in logarithmic time with in a (1+\epsilon) factor approximation of the optimal. We are developing a framework to test our models against a real dataset of urban users.
Collaborative Ensemble Learning: Combining Collaborative and Content-Based Information Filtering via Hierarchical Bayes
Yu, Kai, Schwaighofer, Anton, Tresp, Volker, Ma, Wei-Ying, Zhang, HongJiang
Collaborative filtering (CF) and content-based filtering (CBF) have widely been used in information filtering applications. Both approaches have their strengths and weaknesses which is why researchers have developed hybrid systems. This paper proposes a novel approach to unify CF and CBF in a probabilistic framework, named collaborative ensemble learning. It uses probabilistic SVMs to model each user's profile (as CBF does).At the prediction phase, it combines a society OF users profiles, represented by their respective SVM models, to predict an active users preferences(the CF idea).The combination scheme is embedded in a probabilistic framework and retains an intuitive explanation.Moreover, collaborative ensemble learning does not require a global training stage and thus can incrementally incorporate new data.We report results based on two data sets. For the Reuters-21578 text data set, we simulate user ratings under the assumption that each user is interested in only one category. In the second experiment, we use users' opinions on a set of 642 art images that were collected through a web-based survey. For both data sets, collaborative ensemble achieved excellent performance in terms of recommendation accuracy.