Fazelnia, Ghazal
Stochastic Variational Inference with Tuneable Stochastic Annealing
Paisley, John, Fazelnia, Ghazal, Barr, Brian
In this paper, we exploit the observation that stochastic variational inference (SVI) is a form of annealing and present a modified SVI approach -- applicable to both large and small datasets -- that allows the amount of annealing done by SVI to be tuned. We are motivated by the fact that, in SVI, the larger the batch size the more approximately Gaussian is the intrinsic noise, but the smaller its variance. This low variance reduces the amount of annealing which is needed to escape bad local optimal solutions. We propose a simple method for achieving both goals of having larger variance noise to escape bad local optimal solutions and more data information to obtain more accurate gradient directions. The idea is to set an actual batch size, which may be the size of the data set, and a smaller effective batch size that matches the larger level of variance at this smaller batch size. The result is an approximation to the maximum entropy stochastic gradient at this variance level. We theoretically motivate our approach for the framework of conjugate exponential family models and illustrate the method empirically on the probabilistic matrix factorization collaborative filter, the Latent Dirichlet Allocation topic model, and the Gaussian mixture model.
Generalized User Representations for Transfer Learning
Fazelnia, Ghazal, Gupta, Sanket, Keum, Claire, Koh, Mark, Anderson, Ian, Lalmas, Mounia
We present a novel framework for user representation in large-scale recommender systems, aiming at effectively representing diverse user taste in a generalized manner. Our approach employs a two-stage methodology combining representation learning and transfer learning. The representation learning model uses an autoencoder that compresses various user features into a representation space. In the second stage, downstream task-specific models leverage user representations via transfer learning instead of curating user features individually. We further augment this methodology on the representation's input features to increase flexibility and enable reaction to user events, including new user experiences, in Near-Real Time. Additionally, we propose a novel solution to manage deployment of this framework in production models, allowing downstream models to work independently. We validate the performance of our framework through rigorous offline and online experiments within a large-scale system, showcasing its remarkable efficacy across multiple evaluation tasks. Finally, we show how the proposed framework can significantly reduce infrastructure costs compared to alternative approaches.
Model Selection for Production System via Automated Online Experiments
Dai, Zhenwen, Chandar, Praveen, Fazelnia, Ghazal, Carterette, Ben, Lalmas-Roelleke, Mounia
A challenge that machine learning practitioners in the industry face is the task of selecting the best model to deploy in production. As a model is often an intermediate component of a production system, online controlled experiments such as A/B tests yield the most reliable estimation of the effectiveness of the whole system, but can only compare two or a few models due to budget constraints. We propose an automated online experimentation mechanism that can efficiently perform model selection from a large pool of models with a small number of online experiments. We derive the probability distribution of the metric of interest that contains the model uncertainty from our Bayesian surrogate model trained using historical logs. Our method efficiently identifies the best model by sequentially selecting and deploying a list of models from the candidate set that balance exploration-exploitation. Using simulations based on real data, we demonstrate the effectiveness of our method on two different tasks.
Trajectory Based Podcast Recommendation
Benton, Greg, Fazelnia, Ghazal, Wang, Alice, Carterette, Ben
Podcast recommendation is a growing area of research that presents new challenges and opportunities. Individuals interact with podcasts in a way that is distinct from most other media; and primary to our concerns is distinct from music consumption. We show that successful and consistent recommendations can be made by viewing users as moving through the podcast library sequentially. Recommendations for future podcasts are then made using the trajectory taken from their sequential behavior. Our experiments provide evidence that user behavior is confined to local trends, and that listening patterns tend to be found over short sequences of similar types of shows. Ultimately, our approach gives a450%increase in effectiveness over a collaborative filtering baseline.
Mixed Membership Recurrent Neural Networks
Fazelnia, Ghazal, Ibrahim, Mark, Modarres, Ceena, Wu, Kevin, Paisley, John
Recurrent neural networks (RNNs) have become one of the standard models in sequential data analysis [Rumelhart et al., 1986, Elman, 1990]. At each time step of the RNN, an observation is modeled via a neural network using the observations and hidden states from previous time points. Models such as the RNN, and also the hidden Markov model among others, often implicitly assume a sequence as having a fixed time interval between observations. They also often do not account for group-level effects when multiple sequences are observed and each sequence belongs to one of multiple groups. For example, consider data in the form of a sequence of discrete counts by a set of groups-- e.g., a sequence of purchases (market baskets) for a set of customers, with one sequence per customer. A vanilla RNN implementation would model these sequences using a network with the same parameters, which removes the customer-level information, and according to an enumerated indexing, which removes the time interval information between orders. However, this information is important: customer-specific effects can improve predictive performance for each customer, while an interval of one day versus one month between orders significantly impacts the items likely to be purchased next.