Directed Networks
PAC-Bayesian Learning and Domain Adaptation
Germain, Pascal, Habrard, Amaury, Laviolette, Franรงois, Morvant, Emilie
In machine learning, Domain Adaptation (DA) arises when the distribution gen- erating the test (target) data differs from the one generating the learning (source) data. It is well known that DA is an hard task even under strong assumptions, among which the covariate-shift where the source and target distributions diverge only in their marginals, i.e. they have the same labeling function. Another popular approach is to consider an hypothesis class that moves closer the two distributions while implying a low-error for both tasks. This is a VC-dim approach that restricts the complexity of an hypothesis class in order to get good generalization. Instead, we propose a PAC-Bayesian approach that seeks for suitable weights to be given to each hypothesis in order to build a majority vote. We prove a new DA bound in the PAC-Bayesian context. This leads us to design the first DA-PAC-Bayesian algorithm based on the minimization of the proposed bound. Doing so, we seek for a \rho-weighted majority vote that takes into account a trade-off between three quantities. The first two quantities being, as usual in the PAC-Bayesian approach, (a) the complexity of the majority vote (measured by a Kullback-Leibler divergence) and (b) its empirical risk (measured by the \rho-average errors on the source sample). The third quantity is (c) the capacity of the majority vote to distinguish some structural difference between the source and target samples.
Evaluating Classifiers Without Expert Labels
Jung, Hyun Joon, Lease, Matthew
Machine Learning manuscript No. (will be inserted by the editor) Abstract This paper considers the challenge of evaluating a set of classifiers, as done in shared task evaluations like the KDD Cup or NIST TREC, without expert labels. While expert labels provide the traditional cornerstone for evaluating statistical learners, limited or expensive access to experts represents a practical bottleneck. Instead, we seek methodology for estimating performance of the classifiers (relative and absolute) which is more scalable than expert labeling yet preserves high correlation with evaluation based on expert labels. We consider both: 1) using only labels automatically generated by the classifiers themselves (blind evaluation); and 2) using labels obtained via crowdsourcing. While crowdsourcing methods are lauded for scalability, using such data for evaluation raises serious concerns given the prevalence of label noise. In regard to blind evaluation, two broad strategies are investigated: combine & score and score & combine. Combine & Score methods infer a single "pseudo-gold" label set by aggregating classifier labels; classifiers are then evaluated based on this single pseudo-gold label set. On the other hand, score & combine methods: i) sample multiple label sets from classifier outputs, ii) evaluate classifiers on each label set, and iii) average classifier performance across label sets. When additional crowd labels are also collected, we investigate two alternative avenues for exploiting them: 1) direct evaluation of classifiers; or 2) supervision of combine-and-score methods. To assess generality of our techniques, classifier performance is measured using four common classification metrics, with statistical significance tests establishing relative performance of the classifiers for each metric. Finally, we measure both score and rank correlations between estimated classifier performance vs. actual performance according to expert judgments. Rigorous evaluation of classifiers from the TREC 2011 Crowdsourcing Track shows reliable evaluation can be achieved without reliance on expert labels.
Simulation-based optimal Bayesian experimental design for nonlinear systems
Huan, Xun, Marzouk, Youssef M.
The optimal selection of experimental conditions is essential to maximizing the value of data for inference and prediction, particularly in situations where experiments are time-consuming and expensive to conduct. We propose a general mathematical framework and an algorithmic approach for optimal experimental design with nonlinear simulation-based models; in particular, we focus on finding sets of experiments that provide the most information about targeted sets of parameters. Our framework employs a Bayesian statistical setting, which provides a foundation for inference from noisy, indirect, and incomplete data, and a natural mechanism for incorporating heterogeneous sources of information. An objective function is constructed from information theoretic measures, reflecting expected information gain from proposed combinations of experiments. Polynomial chaos approximations and a two-stage Monte Carlo sampling method are used to evaluate the expected information gain. Stochastic approximation algorithms are then used to make optimization feasible in computationally intensive and high-dimensional settings. These algorithms are demonstrated on model problems and on nonlinear parameter estimation problems arising in detailed combustion kinetics.
Bayesian learning of noisy Markov decision processes
Singh, Sumeetpal S., Chopin, Nicolas, Whiteley, Nick
We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.
Texture Modeling with Convolutional Spike-and-Slab RBMs and Deep Extensions
Luo, Heng, Carrier, Pierre Luc, Courville, Aaron, Bengio, Yoshua
We apply the spike-and-slab Restricted Boltzmann Machine (ssRBM) to texture modeling. The ssRBM with tiled-convolution weight sharing (TssRBM) achieves or surpasses the state-of-the-art on texture synthesis and inpainting by parametric models. We also develop a novel RBM model with a spike-and-slab visible layer and binary variables in the hidden layer. This model is designed to be stacked on top of the TssRBM. We show the resulting deep belief network (DBN) is a powerful generative model that improves on single-layer models and is capable of modeling not only single high-resolution and challenging textures but also multiple textures.
A Traveling Salesman Learns Bayesian Networks
Sahai, Tuhin, Klus, Stefan, Dellnitz, Michael
Structure learning of Bayesian networks is an important problem that arises in numerous machine learning applications. In this work, we present a novel approach for learning the structure of Bayesian networks using the solution of an appropriately constructed traveling salesman problem. In our approach, one computes an optimal ordering (partially ordered set) of random variables using methods for the traveling salesman problem. This ordering significantly reduces the search space for the subsequent greedy optimization that computes the final structure of the Bayesian network. We demonstrate our approach of learning Bayesian networks on real world census and weather datasets. In both cases, we demonstrate that the approach very accurately captures dependencies between random variables. We check the accuracy of the predictions based on independent studies in both application domains.
Domain Adaptations for Computer Vision Applications
A basic assumption of statistical learning theory is that train and test data are drawn from the same underlying distribution. Unfortunately, this assumption doesn't hold in many applications. Instead, ample labeled data might exist in a particular `source' domain while inference is needed in another, `target' domain. Domain adaptation methods leverage labeled data from both domains to improve classification on unseen data in the target domain. In this work we survey domain transfer learning methods for various application domains with focus on recent work in Computer Vision.
A survey of non-exchangeable priors for Bayesian nonparametric models
Foti, Nicholas J., Williamson, Sinead
There has recently been a spate of papers in the statistics and machine learning literature developing dependent stochastic processes and using them as priors in Bayesian nonparametric models. In this paper, we aim to provide a representative snapshot of the currently available models, to elucidate links between these models, and to provide an orienting view of the modern constructions of these processes. Traditional nonparametric priors such as the Dirichlet process [DP, 2], Chinese restaurant process [CRP, 3], Pitman-Yor process [4] and the Indian buffet process [IBP, 5] assume that our observations are exchangeable. Under the assumption of exchangeability the order of the data points does not change the probability distribution. Exchangeability is not a valid assumption for all data.
A unifying representation for a class of dependent random measures
Foti, Nicholas J., Futoma, Joseph D., Rockmore, Daniel N., Williamson, Sinead
We present a general construction for dependent random measures based on thinning Poisson processes on an augmented space. The framework is not restricted to dependent versions of a specific nonparametric model, but can be applied to all models that can be represented using completely random measures. Several existing dependent random measures can be seen as specific cases of this framework. Interesting properties of the resulting measures are derived and the efficacy of the framework is demonstrated by constructing a covariate-dependent latent feature model and topic model that obtain superior predictive performance.
Bayesian nonparametric models for ranked data
Caron, Francois, Teh, Yee Whye
We develop a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a gamma process. We derive a posterior characterization and a simple and effective Gibbs sampler for posterior simulation. We develop a time-varying extension of our model, and apply it to the New York Times lists of weekly bestselling books.