Goto

Collaborating Authors

 igl-p


Learning personalized reward functions with Interaction-Grounded Learning (IGL)

AIHub

Rewards play a crucial role in reinforcement learning (RL). A good choice of reward function motivates an agent to explore and learn which actions are valuable. The feedback that an agent receives via rewards allows them to update their behavior and learn useful policies. However, designing reward functions is complicated and cumbersome, even for domain experts. Automatically inferring a reward function is more desirable for end-users interacting with a system.


Personalized Reward Learning with Interaction-Grounded Learning (IGL)

Maghakian, Jessica, Mineiro, Paul, Panaganti, Kishan, Rucker, Mark, Saran, Akanksha, Tan, Cheng

arXiv.org Artificial Intelligence

In an era of countless content offerings, recommender systems alleviate information overload by providing users with personalized content suggestions. Due to the scarcity of explicit user feedback, modern recommender systems typically optimize for the same fixed combination of implicit feedback signals across all users. However, this approach disregards a growing body of work highlighting that (i) implicit signals can be used by users in diverse ways, signaling anything from satisfaction to active dislike, and (ii) different users communicate preferences in different ways. We propose applying the recent Interaction Grounded Learning (IGL) paradigm to address the challenge of learning representations of diverse user communication modalities. Rather than requiring a fixed, human-designed reward function, IGL is able to learn personalized reward functions for different users and then optimize directly for the latent user satisfaction. We demonstrate the success of IGL with experiments using simulations as well as with real-world production traces. From shopping to reading the news, modern Internet users have access to an overwhelming amount of content and choices from online services. Recommender systems offer a way to improve user experience and decrease information overload by providing a customized selection of content. A key challenge for recommender systems is the rarity of explicit user feedback, such as ratings or likes/dislikes (Grčar et al., 2005). Rather than explicit feedback, practitioners typically use more readily available implicit signals, such as clicks (Hu et al., 2008), webpage dwell time (Yi et al., 2014), or inter-arrival times (Wu et al., 2017) as a proxy signal for user satisfaction. These implicit signals are used as the reward objective in recommender systems, with the popular Click-Through Rate (CTR) metric as the gold standard for the field (Silveira et al., 2019).