The Many Faces of Exponential Weights in Online Learning Machine Learning

A standard introduction to online learning might place Online Gradient Descent at its center and then proceed to develop generalizations and extensions like Online Mirror Descent and second-order methods. Here we explore the alternative approach of putting exponential weights (EW) first. We show that many standard methods and their regret bounds then follow as a special case by plugging in suitable surrogate losses and playing the EW posterior mean. For instance, we easily recover Online Gradient Descent by using EW with a Gaussian prior on linearized losses, and, more generally, all instances of Online Mirror Descent based on regular Bregman divergences also correspond to EW with a prior that depends on the mirror map. Furthermore, appropriate quadratic surrogate losses naturally give rise to Online Gradient Descent for strongly convex losses and to Online Newton Step. We further interpret several recent adaptive methods (iProd, Squint, and a variation of Coin Betting for experts) as a series of closely related reductions to exp-concave surrogate losses that are then handled by Exponential Weights. Finally, a benefit of our EW interpretation is that it opens up the possibility of sampling from the EW posterior distribution instead of playing the mean. As already observed by Bubeck and Eldan, this recovers the best-known rate in Online Bandit Linear Optimization.

Neural Educational Recommendation Engine (NERE) Machine Learning

Quizlet is the most popular online learning tool in the United States, and is used by over 2/3 of high school students, and 1/2 of college students. With more than 95% of Quizlet users reporting improved grades as a result, the platform has become the de-facto tool used in millions of classrooms. In this paper, we explore the task of recommending suitable content for a student to study, given their prior interests, as well as what their peers are studying. We propose a novel approach, i.e. Neural Educational Recommendation Engine (NERE), to recommend educational content by leveraging student behaviors rather than ratings. We have found that this approach better captures social factors that are more aligned with learning. NERE is based on a recurrent neural network that includes collaborative and content-based approaches for recommendation, and takes into account any particular student's speed, mastery, and experience to recommend the appropriate task. We train NERE by jointly learning the user embeddings and content embeddings, and attempt to predict the content embedding for the final timestamp. We also develop a confidence estimator for our neural network, which is a crucial requirement for productionizing this model. We apply NERE to Quizlet's proprietary dataset, and present our results. We achieved an R^2 score of 0.81 in the content embedding space, and a recall score of 54% on our 100 nearest neighbors. This vastly exceeds the recall@100 score of 12% that a standard matrix-factorization approach provides. We conclude with a discussion on how NERE will be deployed, and position our work as one of the first educational recommender systems for the K-12 space.

Bayesian Statistics: Techniques and Models Coursera


About this course: This is the second of a two-course sequence introducing the fundamentals of Bayesian statistics. It builds on the course Bayesian Statistics: From Concept to Data Analysis, which introduces Bayesian methods through use of simple conjugate models. Real-world data often require more sophisticated models to reach realistic conclusions. This course aims to expand our "Bayesian toolbox" with more general models, and computational techniques to fit them. In particular, we will introduce Markov chain Monte Carlo (MCMC) methods, which allow sampling from posterior distributions that have no analytical solution.

Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

Neural Information Processing Systems

Reliance on computationally expensive algorithms for inference has been limiting the use of Bayesian nonparametric models in large scale applications. To tackle this problem, we propose a Bayesian learning algorithm for DP mixture models. Instead of following the conventional paradigm -- random initialization plus iterative update, we take an progressive approach. Starting with a given prior, our method recursively transforms it into an approximate posterior through sequential variational approximation. In this process, new components will be incorporated on the fly when needed. The algorithm can reliably estimate a DP mixture model in one pass, making it particularly suited for applications with massive data. Experiments on both synthetic data and real datasets demonstrate remarkable improvement on efficiency -- orders of magnitude speed-up compared to the state-of-the-art.

Modeling Player Engagement with Bayesian Hierarchical Models

AAAI Conferences

Modeling player engagement is a key challenge in games. However, the gameplay signatures of engaged players can be highly context-sensitive, varying based on where the game is used or what population of players is using it. Traditionally, models of player engagement are investigated in a particular context, and it is unclear how effectively these models generalize to other settings and populations. In this work, we investigate a Bayesian hierarchical linear model for multi-task learning to devise a model of player engagement from a pair of datasets that were gathered in two complementary contexts: a Classroom Study with middle school students and a Laboratory Study with undergraduate students. Both groups of players used similar versions of Crystal Island, an educational interactive narrative game for science learning. Results indicate that the Bayesian hierarchical model outperforms both pooled and context-specific models in cross-validation measures of predicting player motivation from in-game behaviors, particularly for the smaller Classroom Study group. Further, we find that the posterior distributions of model parameters indicate that the coefficient for a measure of gameplay performance significantly differs between groups. Drawing upon their capacity to share information across groups, hierarchical Bayesian methods provide an effective approach for modeling player engagement with data from similar, but different, contexts.