McFadden, Jim
Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology
Ie, Eugene, Jain, Vihan, Wang, Jing, Narvekar, Sanmit, Agarwal, Ritesh, Wu, Rui, Cheng, Heng-Tze, Lustman, Morgane, Gatto, Vince, Covington, Paul, McFadden, Jim, Chandra, Tushar, Boutilier, Craig
Recommender systems have become ubiquitous, transforming user interactions with products, services and content in a wide variety of domains. In content recommendation, recommenders generally surface relevant and/or novel personalized content based on learned models of user preferences (e.g., as in collaborative filtering [Breese et al., 1998, Konstan et al., 1997, Srebro et al., 2004, Salakhutdinov and Mnih, 2007]) or predictive models of user responses to specific recommendations. Well-known applications of recommender systems include video recommendations on YouTube [Covington et al., 2016], movie recommendations on Netflix [Gomez-Uribe and Hunt, 2016] and playlist construction on Spotify [Jacobson et al., 2016]. It is increasingly common to train deep neural networks (DNNs) [van den Oord et al., 2013, Wang et al., 2015, Covington et al., 2016, Cheng et al., 2016] to predict user responses (e.g., click-through rates, content engagement, ratings, likes) to generate, score and serve candidate recommendations. Practical recommender systems largely focus on myopic prediction--estimating a user's immediate response to a recommendation--without considering the long-term impact on subsequent user behavior. This can be limiting: modeling a recommendation's stochastic impact on the future affords opportunities to trade off user engagement in the near-term for longer-term benefit (e.g., by probing a user's interests, or improving satisfaction).