Lamkhede, Sudarshan
Sliding Window Training -- Utilizing Historical Recommender Systems Data for Foundation Models
Joshi, Swanand, Feng, Yesu, Hsiao, Ko-Jen, Zhang, Zhe, Lamkhede, Sudarshan
Long-lived recommender systems (RecSys) often encounter lengthy Oftentimes in industrial applications, foundation models that user-item interaction histories that span many years. To effectively have inference time restrictions on serving memory footprint cannot learn long term user preferences, Large RecSys foundation models exceed a certain input dimension and model size. This constraint (FM) need to encode this information in pretraining. Usually, this raises a question on how to most effectively utilize a large-scale is done by either generating a long enough sequence length to interaction corpus [1]. The most straightforward way is to truncate take all history sequences as input at the cost of large model input historical interactions. This simplification, however, comes at the dimension or by dropping some parts of the user history to accommodate cost of not using valuable information about user journeys and model size and latency requirements on the production their rich history of interactions during model training [5].
Synergistic Signals: Exploiting Co-Engagement and Semantic Links via Graph Neural Networks
Huang, Zijie, Li, Baolin, Asgharzadeh, Hafez, Cocos, Anne, Liu, Lingyi, Cox, Evan, Wise, Colby, Lamkhede, Sudarshan
Given a set of candidate entities (e.g. movie titles), the ability to identify similar entities is a core capability of many recommender systems. Most often this is achieved by collaborative filtering approaches, i.e. if users co-engage with a pair of entities frequently enough, the embeddings should be similar. However, relying on co-engagement data alone can result in lower-quality embeddings for new and unpopular entities. We study this problem in the context recommender systems at Netflix. We observe that there is abundant semantic information such as genre, content maturity level, themes, etc. that complements co-engagement signals and provides interpretability in similarity models. To learn entity similarities from both data sources holistically, we propose a novel graph-based approach called SemanticGNN. SemanticGNN models entities, semantic concepts, collaborative edges, and semantic edges within a large-scale knowledge graph and conducts representation learning over it. Our key technical contributions are twofold: (1) we develop a novel relation-aware attention graph neural network (GNN) to handle the imbalanced distribution of relation types in our graph; (2) to handle web-scale graph data that has millions of nodes and billions of edges, we develop a novel distributed graph training paradigm. The proposed model is successfully deployed within Netflix and empirical experiments indicate it yields up to 35% improvement in performance on similarity judgment tasks.