Singh, Shaun
Unbiased Decisions Reduce Regret: Adversarial Domain Adaptation for the Bank Loan Problem
Gal, Elena, Singh, Shaun, Pacchiano, Aldo, Walker, Ben, Lyons, Terry, Foerster, Jakob
In many real world settings binary classification decisions are made based on limited data in near real-time, e.g. when assessing a loan application. We focus on a class of these problems that share a common feature: the true label is only observed when a data point is assigned a positive label by the principal, e.g. we only find out whether an applicant defaults if we accepted their loan application. As a consequence, the false rejections become self-reinforcing and cause the labelled training set, that is being continuously updated by the model decisions, to accumulate bias. Prior work mitigates this effect by injecting optimism into the model, however this comes at the cost of increased false acceptance rate. We introduce adversarial optimism (AdOpt) to directly address bias in the training set using adversarial domain adaptation. The goal of AdOpt is to learn an unbiased but informative representation of past data, by reducing the distributional shift between the set of accepted data points and all data points seen thus far. AdOpt significantly exceeds state-of-the-art performance on a set of challenging benchmark problems. Our experiments also provide initial evidence that the introduction of adversarial domain adaptation improves fairness in this setting.
Practical Policy Optimization with Personalized Experimentation
Garrard, Mia, Wang, Hanson, Letham, Ben, Singh, Shaun, Kazerouni, Abbas, Tan, Sarah, Wang, Zehui, Huang, Yin, Hu, Yichun, Zhou, Chad, Zhou, Norm, Bakshy, Eytan
Many organizations measure treatment effects via an experimentation platform to evaluate the casual effect of product variations prior to full-scale deployment. However, standard experimentation platforms do not perform optimally for end user populations that exhibit heterogeneous treatment effects (HTEs). Here we present a personalized experimentation framework, Personalized Experiments (PEX), which optimizes treatment group assignment at the user level via HTE modeling and sequential decision policy optimization to optimize multiple short-term and long-term outcomes simultaneously. We describe an end-to-end workflow that has proven to be successful in practice and can be readily implemented using open-source software.
Distilling Heterogeneity: From Explanations of Heterogeneous Treatment Effect Models to Interpretable Policies
Wu, Han, Tan, Sarah, Li, Weiwei, Garrard, Mia, Obeng, Adam, Dimmery, Drew, Singh, Shaun, Wang, Hanson, Jiang, Daniel, Bakshy, Eytan
Internet companies are increasingly using machine learning models to create personalized policies which assign, for each individual, the best predicted treatment for that individual. They are frequently derived from black-box heterogeneous treatment effect (HTE) models that predict individual-level treatment effects. In this paper, we focus on (1) learning explanations for HTE models; (2) learning interpretable policies that prescribe treatment assignments. We also propose guidance trees, an approach to ensemble multiple interpretable policies without the loss of interpretability. These rule-based interpretable policies are easy to deploy and avoid the need to maintain a HTE model in a production environment.
Looper: An end-to-end ML platform for product decisions
Markov, Igor L., Wang, Hanson, Kasturi, Nitya, Singh, Shaun, Yuen, Sze Wai, Garrard, Mia, Tran, Sarah, Huang, Yin, Wang, Zehui, Glotov, Igor, Gupta, Tanvi, Huang, Boshuang, Chen, Peng, Xie, Xiaowen, Belkin, Michael, Uryasev, Sal, Howie, Sam, Bakshy, Eytan, Zhou, Norm
Modern software systems and products increasingly rely on machine learning models to make data-driven decisions based on interactions with users and systems, e.g., compute infrastructure. For broader adoption, this practice must (i) accommodate software engineers without ML backgrounds, and (ii) provide mechanisms to optimize for product goals. In this work, we describe general principles and a specific end-to-end ML platform, Looper, which offers easy-to-use APIs for decision-making and feedback collection. Looper supports the full end-to-end ML lifecycle from online data collection to model training, deployment, inference, and extends support to evaluation and tuning against product goals. We outline the platform architecture and overall impact of production deployment. We also describe the learning curve and summarize experiences from platform adopters.
Real-world Video Adaptation with Reinforcement Learning
Mao, Hongzi, Chen, Shannon, Dimmery, Drew, Singh, Shaun, Blaisdell, Drew, Tian, Yuandong, Alizadeh, Mohammad, Bakshy, Eytan
Client-side video players employ adaptive bitrate (ABR) algorithms to optimize user quality of experience (QoE). We evaluate recently proposed RL-based ABR methods in Facebook's web-based video streaming platform. Real-world ABR contains several challenges that requires customized designs beyond off-the-shelf RL algorithms -- we implement a scalable neural network architecture that supports videos with arbitrary bitrate encodings; we design a training method to cope with the variance resulting from the stochasticity in network conditions; and we leverage constrained Bayesian optimization for reward shaping in order to optimize the conflicting QoE objectives. In a week-long worldwide deployment with more than 30 million video streaming sessions, our RL approach outperforms the existing human-engineered ABR algorithms.