Goto

Collaborating Authors

 Learning Graphical Models



Catching Up Faster in Bayesian Model Selection and Model Averaging

Neural Information Processing Systems

Bayesian model averaging, model selection and their approximations such as BIC are generally statistically consistent, but sometimes achieve slower rates of convergence thanother methods such as AIC and leave-one-out cross-validation. On the other hand, these other methods can be inconsistent. We identify the catchup phenomenon as a novel explanation for the slow convergence of Bayesian methods. Basedon this analysis we define the switch-distribution, a modification of the Bayesian model averaging distribution. We prove that in many situations model selection and prediction based on the switch-distribution is both consistent and achieves optimal convergence rates, thereby resolving the AIC-BIC dilemma. The method is practical; we give an efficient algorithm.




Learning Bounds for Domain Adaptation

Neural Information Processing Systems

Empirical risk minimization offers well-known learning guarantees when training and test data come from the same domain. In the real world, though, we often wish to adapt a classifier from a source domain with a large amount of training data to different target domain with very little training data. In this work we give uniform convergence bounds for algorithms that minimize a convex combination of source and target empirical risk. The bounds explicitly model the inherent trade-off between training on a large but inaccurate source data set and a small but accurate target training set. Our theory also gives results when we have multiple source domains, each of which may have a different number of instances, and we exhibit cases in which minimizing a non-uniform combination of source risks can achieve much lower target error than standard empirical risk minimization.


Supervised Topic Models

Neural Information Processing Systems

We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations. Prediction problems motivate this research: we use the fitted model to predict response values for new documents. We test sLDA on two real-world problems: movie ratings predicted from reviews, and web page popularity predicted from text descriptions. We illustrate the benefits of sLDA versus modern regularized regression, as well as versus an unsupervised LDA analysis followed by a separate regression.



One-Pass Boosting

Neural Information Processing Systems

This paper studies boosting algorithms that make a single pass over a set of base classifiers. Wefirst analyze a one-pass algorithm in the setting of boosting with diverse base classifiers. Our guarantee is the same as the best proved for any boosting algorithm, butour one-pass algorithm is much faster than previous approaches. We next exhibit a random source of examples for which a "picky" variant of AdaBoost thatskips poor base classifiers can outperform the standard AdaBoost algorithm, whichuses every base classifier, by an exponential factor. Experiments with Reuters and synthetic data show that one-pass boosting can substantially improveon the accuracy of Naive Bayes, and that picky boosting can sometimes lead to a further improvement in accuracy.


Random Sampling of States in Dynamic Programming

Neural Information Processing Systems

We combine two threads of research on approximate dynamic programming: random sampling of states and using local trajectory optimizers to globally optimize a policy and associated value function. This combination allows us to replace a dense multidimensional grid with a much sparser adaptive sampling of states. Our focus is on finding steady state policies for the deterministic time invariant discrete time control problems with continuous states and actions often found in robotics. In this paper we show that we can now solve problems we couldn't solve previously with regular grid-based approaches.


A Spectral Regularization Framework for Multi-Task Structure Learning

Neural Information Processing Systems

Learning the common structure shared by a set of supervised tasks is an important practical and theoretical problem. Knowledge of this structure may lead to better generalizationperformance on the tasks and may also facilitate learning new tasks. We propose a framework for solving this problem, which is based on regularization withspectral functions of matrices. This class of regularization problems exhibits appealing computational properties and can be optimized efficiently by an alternating minimization algorithm. In addition, we provide a necessary and sufficient condition for convexity of the regularizer.