An Online Learning Method for Improving Over-subscription Planning

AAAI Conferences

Despite the recent resurgence of interest in learning methods for planning, most such efforts are still focused exclusively on classical planning problems. In this work, we investigate the effectiveness of learning approaches for improving over-subscription planning, a problem that has received significant recent interest. Viewing over-subscription planning as a domain-independent optimization problem, we adapt the STAGE (Boyan and Moore 2000) approach to learn and improve the plan search. The key challenge in our study is how to automate the feature generation process. In our case, we developed and experimented with a relational feature set, based on Taxonomic syntax as well as a propositional feature set, based on ground-facts. The feature generation process and training data generation process are all automatic, making it a completely domain-independent optimization process that takes advantage of online learning. In empirical studies, our proposed approach improved upon the baseline planner for over-subscription planning on many of the benchmark problems.


Procedural help in Andes: Generating hints using a Bayesian network student model

AAAI Conferences

One of the most important problems for an intelligent tutoring system is deciding how to respond when a student asks for help. Responding cooperatively requires an understanding of both what solution path the student is pursuing, and the student's current level of domain knowledge. Andes, an intelligent tutoring system for Newtonian physics, refers to a probabilistic student model to make decisions about responding to help requests. Andes' student model uses a Bayesian network that computes a probabilistic assessment of three kinds of information: (1) the student's general knowledge about physics, (2) the student's specific knowledge about the current problem, and (3) the abstract plans that the student may be pursuing to solve the problem. Using this model, Andes provides feedback and hints tailored to the student's knowledge and goals.


Online Learning of k-CNF Boolean Functions

AAAI Conferences

This paper revisits the problem of learning a k-CNF Boolean function from examples, for fixed k, in the context of online learning under the logarithmic loss. We give a Bayesian interpretation to one of Valiant’s classic PAC learning algorithms, which we then build upon to derive three efficient, online, probabilistic, supervised learning algorithms for predicting the output of an unknown k-CNF Boolean function. We analyze the loss of our methods, and show that the cumulative log-loss can be upper bounded by a polynomial function of the size of each example.


The Many Faces of Exponential Weights in Online Learning

arXiv.org Machine Learning

A standard introduction to online learning might place Online Gradient Descent at its center and then proceed to develop generalizations and extensions like Online Mirror Descent and second-order methods. Here we explore the alternative approach of putting exponential weights (EW) first. We show that many standard methods and their regret bounds then follow as a special case by plugging in suitable surrogate losses and playing the EW posterior mean. For instance, we easily recover Online Gradient Descent by using EW with a Gaussian prior on linearized losses, and, more generally, all instances of Online Mirror Descent based on regular Bregman divergences also correspond to EW with a prior that depends on the mirror map. Furthermore, appropriate quadratic surrogate losses naturally give rise to Online Gradient Descent for strongly convex losses and to Online Newton Step. We further interpret several recent adaptive methods (iProd, Squint, and a variation of Coin Betting for experts) as a series of closely related reductions to exp-concave surrogate losses that are then handled by Exponential Weights. Finally, a benefit of our EW interpretation is that it opens up the possibility of sampling from the EW posterior distribution instead of playing the mean. As already observed by Bubeck and Eldan, this recovers the best-known rate in Online Bandit Linear Optimization.