unseen user
LoRe: Personalizing LLMs via Low-Rank Reward Modeling
Bose, Avinandan, Xiong, Zhihan, Chi, Yuejie, Du, Simon Shaolei, Xiao, Lin, Fazel, Maryam
Personalizing large language models (LLMs) to accommodate diverse user preferences is essential for enhancing alignment and user satisfaction. Traditional reinforcement learning from human feedback (RLHF) approaches often rely on monolithic value representations, limiting their ability to adapt to individual preferences. We introduce a novel framework that leverages low-rank preference modeling to efficiently learn and generalize user-specific reward functions. By representing reward functions in a low-dimensional subspace and modeling individual preferences as weighted combinations of shared basis functions, our approach avoids rigid user categorization while enabling scalability and few-shot adaptation. We validate our method on multiple preference datasets, demonstrating superior generalization to unseen users and improved accuracy in preference prediction tasks.
CoPL: Collaborative Preference Learning for Personalizing LLMs
Choi, Youngbin, Cho, Seunghyuk, Lee, Minjong, Park, MoonJeong, Ko, Yesong, Ok, Jungseul, Kim, Dongwoo
Personalizing large language models (LLMs) is important for aligning outputs with diverse user preferences, yet existing methods struggle with flexibility and generalization. We propose CoPL (Collaborative Preference Learning), a graph-based collaborative filtering framework that models user-response relationships to enhance preference estimation, particularly in sparse annotation settings. By integrating a mixture of LoRA experts, CoPL efficiently fine-tunes LLMs while dynamically balancing shared and user-specific preferences. Additionally, an optimization-free adaptation strategy enables generalization to unseen users without fine-tuning. Experiments on UltraFeedback-P demonstrate that CoPL outperforms existing personalized reward models, effectively capturing both common and controversial preferences, making it a scalable solution for personalized LLM alignment.
Item Graph Convolution Collaborative Filtering for Inductive Recommendations
D'Amico, Edoardo, Muhammad, Khalil, Tragos, Elias, Smyth, Barry, Hurley, Neil, Lawlor, Aonghus
Graph Convolutional Networks (GCN) have been recently employed as core component in the construction of recommender system algorithms, interpreting user-item interactions as the edges of a bipartite graph. However, in the absence of side information, the majority of existing models adopt an approach of randomly initialising the user embeddings and optimising them throughout the training process. This strategy makes these algorithms inherently transductive, curtailing their ability to generate predictions for users that were unseen at training time. To address this issue, we propose a convolution-based algorithm, which is inductive from the user perspective, while at the same time, depending only on implicit user-item interaction data. We propose the construction of an item-item graph through a weighted projection of the bipartite interaction network and to employ convolution to inject higher order associations into item embeddings, while constructing user representations as weighted sums of the items with which they have interacted. Despite not training individual embeddings for each user our approach achieves state-of-the-art recommendation performance with respect to transductive baselines on four real-world datasets, showing at the same time robust inductive performance.
Zero-Shot Recommender Systems
Ding, Hao, Ma, Yifei, Deoras, Anoop, Wang, Yuyang, Wang, Hao
Performance of recommender systems (RS) relies heavily on the Many large scale e-commerce platforms (such as Etsy, Overstock, amount of training data available. This poses a chicken-and-egg etc) and online content platforms (such as Spotify, Overstock, Disney, problem for early-stage products, whose amount of data, in turn, Netflix, etc) have such a large inventory of items that showcasing relies on the performance of their RS. On the other hand, zero-shot all of them in front of their users is simply not practical. In learning promises some degree of generalization from an old dataset particular, in the online content category of businesses, it is often to an entirely new dataset. In this paper, we explore the possibility seen that users of their service do not have a crisp intent in mind of zero-shot learning in RS. We develop an algorithm, dubbed ZEro-unlike in the retail shopping experience where the users often have Shot Recommenders (ZESRec), that is trained on an old dataset a crisp intent of purchasing something. The need for personalized and generalize to a new one where there are neither overlapping recommendations therefore arises from the fact that not only it is users nor overlapping items, a setting that contrasts typical crossdomain impractical to show all the items in the catalogue but often times RS that has either overlapping users or items. Different users of such services need help discovering the next best thing from categorical item indices, i.e., item ID, in previous methods, -- be it the new and exciting movie or be it a new music album or ZESRec uses items' natural-language descriptions (or description even a piece of merchandise that they may want to consider for embeddings) as their continuous indices, and therefore naturally future buying if not immediately.