Plotting

 Chandar, Praveen


Model Selection for Production System via Automated Online Experiments

arXiv.org Machine Learning

A challenge that machine learning practitioners in the industry face is the task of selecting the best model to deploy in production. As a model is often an intermediate component of a production system, online controlled experiments such as A/B tests yield the most reliable estimation of the effectiveness of the whole system, but can only compare two or a few models due to budget constraints. We propose an automated online experimentation mechanism that can efficiently perform model selection from a large pool of models with a small number of online experiments. We derive the probability distribution of the metric of interest that contains the model uncertainty from our Bayesian surrogate model trained using historical logs. Our method efficiently identifies the best model by sequentially selecting and deploying a list of models from the candidate set that balance exploration-exploitation. Using simulations based on real data, we demonstrate the effectiveness of our method on two different tasks.


Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions

arXiv.org Machine Learning

Users of music streaming, video streaming, news recommendation, Offline evaluation is challenging because the deployed recommender and e-commerce services often engage with content in a sequential decides which items the user sees, introducing significant manner. Providing and evaluating good sequences of recommendations exposure bias in logged data [7, 16, 22]. Various methods have been is therefore a central problem for these services. Prior proposed to mitigate bias using counterfactual evaluation. In this reweighting-based counterfactual evaluation methods either suffer paper, we use terminology from the multi-armed bandit framework from high variance or make strong independence assumptions to discuss these methods: the recommender performs an action about rewards. We propose a new counterfactual estimator that allows by showing an item depending on the observed context (e.g., user for sequential interactions in the rewards with lower variance covariates, item covariates, time of day, day of the week) and then in an asymptotically unbiased manner. Our method uses graphical observes a reward through the user response (e.g., a stream, a purchase, assumptions about the causal relationships of the slate to reweight or length of consumption) [14]. The recommender follows the rewards in the logging policy in a way that approximates the a policy distribution over actions by drawing items stochastically expected sum of rewards under the target policy. Extensive experiments conditioned on the context. in simulation and on a live recommender system show that The basic idea of counterfactual evaluation is to estimate how a our approach outperforms existing methods in terms of bias and new policy would have performed if it had been deployed instead data efficiency for the sequential track recommendations problem. of the deployed policy.


A Comparison of Methods for Treatment Assignment with an Application to Playlist Generation

arXiv.org Machine Learning

This study presents a systematic comparison of methods for individual treatment assignment, a general problem that arises in many applications and has received significant attention from economists, computer scientists, and social scientists. We characterize the various methods proposed in the literature into three general approaches: learning models to predict outcomes, learning models to predict causal effects, and learning models to predict optimal treatment assignments. We show analytically that optimizing for outcome or causal-effect prediction is not the same as optimizing for treatment assignments, and thus we should prefer learning models that optimize for treatment assignments. We then compare and contrast the three approaches empirically in the context of choosing, for each user, the best algorithm for playlist generation in order to optimize engagement. This is the first comparison of the different treatment assignment approaches on a real-world application at scale (based on more than half a billion individual treatment assignments). Our results show (i) that applying different algorithms to different users can improve streams substantially compared to deploying the same algorithm for everyone, (ii) that personalized assignments improve substantially with larger data sets, and (iii) that learning models by optimizing treatment assignments rather than outcome or causal-effect predictions can improve treatment assignment performance by more than 28%.