Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions
McInerney, James, Brost, Brian, Chandar, Praveen, Mehrotra, Rishabh, Carterette, Ben
Users of music streaming, video streaming, news recommendation, Offline evaluation is challenging because the deployed recommender and e-commerce services often engage with content in a sequential decides which items the user sees, introducing significant manner. Providing and evaluating good sequences of recommendations exposure bias in logged data [7, 16, 22]. Various methods have been is therefore a central problem for these services. Prior proposed to mitigate bias using counterfactual evaluation. In this reweighting-based counterfactual evaluation methods either suffer paper, we use terminology from the multi-armed bandit framework from high variance or make strong independence assumptions to discuss these methods: the recommender performs an action about rewards. We propose a new counterfactual estimator that allows by showing an item depending on the observed context (e.g., user for sequential interactions in the rewards with lower variance covariates, item covariates, time of day, day of the week) and then in an asymptotically unbiased manner. Our method uses graphical observes a reward through the user response (e.g., a stream, a purchase, assumptions about the causal relationships of the slate to reweight or length of consumption) [14]. The recommender follows the rewards in the logging policy in a way that approximates the a policy distribution over actions by drawing items stochastically expected sum of rewards under the target policy. Extensive experiments conditioned on the context. in simulation and on a live recommender system show that The basic idea of counterfactual evaluation is to estimate how a our approach outperforms existing methods in terms of bias and new policy would have performed if it had been deployed instead data efficiency for the sequential track recommendations problem. of the deployed policy.
Aug-23-2020
- Country:
- Europe > United Kingdom
- England (0.14)
- North America > United States (0.28)
- Europe > United Kingdom
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Leisure & Entertainment (1.00)
- Media > Music (1.00)
- Technology: