Worst-Case Bounds for Gaussian Process Models

Neural Information Processing Systems

Dean P. Foster University of Pennsylvania We present a competitive analysis of some nonparametric Bayesian algorithms ina worst-case online learning setting, where no probabilistic assumptions about the generation of the data are made. We consider models which use a Gaussian process prior (over the space of all functions) andprovide bounds on the regret (under the log loss) for commonly usednon-parametric Bayesian algorithms -- including Gaussian regression and logistic regression -- which show how these algorithms can perform favorably under rather general conditions.


AAAI Conferences

This paper describes an effort to measure the effectiveness of tutor help in an intelligent tutoring system. Although conventional pre-and post-test experiments can determine whether tutor help is effective, they are expensive to conduct. Furthermore, pre-and post-test experiments often do not model student knowledge explicitly and thus are ignoring a source of information: students often request help about words they do not know. Therefore, we construct a dynamic Bayes net (which we call the Help model) that models tutor help and student knowledge in one coherent framework. The Help model distinguishes two different effects of help: scaffolding immediate performance vs. teaching persistent knowledge that improves long term performance. We train the Help model to fit student performance data gathered from usage of the Reading Tutor (Mostow & Aist, 2001). The parameters of the trained model suggest that students benefit from both the scaffolding and teaching effects of help. That is, students are more likely to perform correctly on the current attempt and learn persistent knowledge if tutor help is provided. Thus, our framework is able to distinguish two types of influence that tutor help has on the student, and can determine whether help helps learning without an explicit controlled study.

Artificial Intelligence and Risk Communication

AAAI Conferences

The challenges of effective health risk communication are well known. This paper provides pointers to the health communication literature that discuss these problems. Tailoring printed information, visual displays, and interactive multimedia have been proposed in the health communication literature as promising approaches. On-line risk communication applications are increasing on the internet. However, potential effectiveness of applications using conventional computer technology is limited. We propose that use of artificial intelligence, building upon research in Intelligent Tutoring Systems, might be able to overcome these limitations.

The Many Faces of Exponential Weights in Online Learning

arXiv.org Machine Learning

A standard introduction to online learning might place Online Gradient Descent at its center and then proceed to develop generalizations and extensions like Online Mirror Descent and second-order methods. Here we explore the alternative approach of putting exponential weights (EW) first. We show that many standard methods and their regret bounds then follow as a special case by plugging in suitable surrogate losses and playing the EW posterior mean. For instance, we easily recover Online Gradient Descent by using EW with a Gaussian prior on linearized losses, and, more generally, all instances of Online Mirror Descent based on regular Bregman divergences also correspond to EW with a prior that depends on the mirror map. Furthermore, appropriate quadratic surrogate losses naturally give rise to Online Gradient Descent for strongly convex losses and to Online Newton Step. We further interpret several recent adaptive methods (iProd, Squint, and a variation of Coin Betting for experts) as a series of closely related reductions to exp-concave surrogate losses that are then handled by Exponential Weights. Finally, a benefit of our EW interpretation is that it opens up the possibility of sampling from the EW posterior distribution instead of playing the mean. As already observed by Bubeck and Eldan, this recovers the best-known rate in Online Bandit Linear Optimization.

Sparse Factor Analysis for Learning and Content Analytics

arXiv.org Machine Learning

We develop a new model and algorithms for machine learning-based learning analytics, which estimate a learner's knowledge of the concepts underlying a domain, and content analytics, which estimate the relationships among a collection of questions and those concepts. Our model represents the probability that a learner provides the correct response to a question in terms of three factors: their understanding of a set of underlying concepts, the concepts involved in each question, and each question's intrinsic difficulty. We estimate these factors given the graded responses to a collection of questions. The underlying estimation problem is ill-posed in general, especially when only a subset of the questions are answered. The key observation that enables a well-posed solution is the fact that typical educational domains of interest involve only a small number of key concepts. Leveraging this observation, we develop both a bi-convex maximum-likelihood and a Bayesian solution to the resulting SPARse Factor Analysis (SPARFA) problem. We also incorporate user-defined tags on questions to facilitate the interpretability of the estimated factors. Experiments with synthetic and real-world data demonstrate the efficacy of our approach. Finally, we make a connection between SPARFA and noisy, binary-valued (1-bit) dictionary learning that is of independent interest.