The Many Faces of Exponential Weights in Online Learning

arXiv.org Machine Learning

A standard introduction to online learning might place Online Gradient Descent at its center and then proceed to develop generalizations and extensions like Online Mirror Descent and second-order methods. Here we explore the alternative approach of putting exponential weights (EW) first. We show that many standard methods and their regret bounds then follow as a special case by plugging in suitable surrogate losses and playing the EW posterior mean. For instance, we easily recover Online Gradient Descent by using EW with a Gaussian prior on linearized losses, and, more generally, all instances of Online Mirror Descent based on regular Bregman divergences also correspond to EW with a prior that depends on the mirror map. Furthermore, appropriate quadratic surrogate losses naturally give rise to Online Gradient Descent for strongly convex losses and to Online Newton Step. We further interpret several recent adaptive methods (iProd, Squint, and a variation of Coin Betting for experts) as a series of closely related reductions to exp-concave surrogate losses that are then handled by Exponential Weights. Finally, a benefit of our EW interpretation is that it opens up the possibility of sampling from the EW posterior distribution instead of playing the mean. As already observed by Bubeck and Eldan, this recovers the best-known rate in Online Bandit Linear Optimization.


Modeling Player Engagement with Bayesian Hierarchical Models

AAAI Conferences

Modeling player engagement is a key challenge in games. However, the gameplay signatures of engaged players can be highly context-sensitive, varying based on where the game is used or what population of players is using it. Traditionally, models of player engagement are investigated in a particular context, and it is unclear how effectively these models generalize to other settings and populations. In this work, we investigate a Bayesian hierarchical linear model for multi-task learning to devise a model of player engagement from a pair of datasets that were gathered in two complementary contexts: a Classroom Study with middle school students and a Laboratory Study with undergraduate students. Both groups of players used similar versions of Crystal Island, an educational interactive narrative game for science learning. Results indicate that the Bayesian hierarchical model outperforms both pooled and context-specific models in cross-validation measures of predicting player motivation from in-game behaviors, particularly for the smaller Classroom Study group. Further, we find that the posterior distributions of model parameters indicate that the coefficient for a measure of gameplay performance significantly differs between groups. Drawing upon their capacity to share information across groups, hierarchical Bayesian methods provide an effective approach for modeling player engagement with data from similar, but different, contexts.


Bayesian Statistics Coursera

@machinelearnbot

About this course: This course describes Bayesian statistics, in which one's inferences about parameters or hypotheses are updated as evidence accumulates. You will learn to use Bayes' rule to transform prior probabilities into posterior probabilities, and be introduced to the underlying theory and perspective of the Bayesian paradigm. The course will apply Bayesian methods to several practical problems, to show end-to-end Bayesian analyses that move from framing the question to building models to eliciting prior probabilities to implementing in R (free statistical software) the final posterior distribution. Additionally, the course will introduce credible regions, Bayesian comparisons of means and proportions, Bayesian regression and inference using multiple models, and discussion of Bayesian prediction. We assume learners in this course have background knowledge equivalent to what is covered in the earlier three courses in this specialization: "Introduction to Probability and Data," "Inferential Statistics," and "Linear Regression and Modeling."


Bayesian Statistics: Techniques and Models Coursera

@machinelearnbot

About this course: This is the second of a two-course sequence introducing the fundamentals of Bayesian statistics. It builds on the course Bayesian Statistics: From Concept to Data Analysis, which introduces Bayesian methods through use of simple conjugate models. Real-world data often require more sophisticated models to reach realistic conclusions. This course aims to expand our "Bayesian toolbox" with more general models, and computational techniques to fit them. In particular, we will introduce Markov chain Monte Carlo (MCMC) methods, which allow sampling from posterior distributions that have no analytical solution.


Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

Neural Information Processing Systems

Reliance on computationally expensive algorithms for inference has been limiting the use of Bayesian nonparametric models in large scale applications. To tackle this problem, we propose a Bayesian learning algorithm for DP mixture models. Instead of following the conventional paradigm -- random initialization plus iterative update, we take an progressive approach. Starting with a given prior, our method recursively transforms it into an approximate posterior through sequential variational approximation. In this process, new components will be incorporated on the fly when needed. The algorithm can reliably estimate a DP mixture model in one pass, making it particularly suited for applications with massive data. Experiments on both synthetic data and real datasets demonstrate remarkable improvement on efficiency -- orders of magnitude speed-up compared to the state-of-the-art.