Goto

Collaborating Authors

 Bayesian Learning


Probabilistic Movement Primitives

Neural Information Processing Systems

Movement Primitives (MP) are a well-established approach for representing modular and re-usable robot movement generators. Many state-of-the-art robot learning successes are based MPs, due to their compact representation of the inherently continuous and high dimensional robot movements. A major goal in robot learning is to combine multiple MPs as building blocks in a modular control architecture to solve complex tasks. To this effect, a MP representation has to allow for blending between motions, adapting to altered task variables, and co-activating multiple MPs in parallel. We present a probabilistic formulation of the MP concept that maintains a distribution over trajectories. Our probabilistic approach allows for the derivation of new operations which are essential for implementing all aforementioned properties in one framework. In order to use such a trajectory distribution for robot movement control, we analytically derive a stochastic feedback controller which reproduces the given trajectory distribution. We evaluate and compare our approach to existing methods on several simulated as well as real robot scenarios.


Policy Shaping: Integrating Human Feedback with Reinforcement Learning

Neural Information Processing Systems

A long term goal of Interactive Reinforcement Learning is to incorporate nonexpert human feedback to solve complex tasks. Some state-of-the-art methods have approached this problem by mapping human information to rewards and values and iterating over them to compute better control policies. In this paper we argue for an alternate, more effective characterization of human feedback: Policy Shaping. We introduce Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct policy labels. We compare Advise to state-of-the-art approaches and show that it can outperform them and is robust to infrequent and inconsistent human feedback.


Reconciling "priors " & "priors " without prejudice? Rรฉmi Gribonval

Neural Information Processing Systems

There are two major routes to address linear inverse problems. Whereas regularization-based approaches build estimators as solutions of penalized regression optimization problems, Bayesian estimators rely on the posterior distribution of the unknown, given some assumed family of priors. While these may seem radically different approaches, recent results have shown that, in the context of additive white Gaussian denoising, the Bayesian conditional mean estimator is always the solution of a penalized regression problem. The contribution of this paper is twofold. First, we extend the additive white Gaussian denoising results to general linear inverse problems with colored Gaussian noise. Second, we characterize conditions under which the penalty function associated to the conditional mean estimator can satisfy certain popular properties such as convexity, separability, and smoothness. This sheds light on some tradeoff between computational efficiency and estimation accuracy in sparse regularization, and draws some connections between Bayesian estimation and proximal optimization.


Learning word embeddings efficiently with noise-contrastive estimation

Neural Information Processing Systems

Continuous-valued word embeddings learned by neural language models have recently been shown to capture semantic and syntactic information about words very well, setting performance records on several word similarity tasks. The best results are obtained by learning high-dimensional embeddings from very large quantities of data, which makes scalability of the training method a critical factor. We propose a simple and scalable new approach to learning word embeddings based on training log-bilinear models with noise-contrastive estimation. Our approach is simpler, faster, and produces better results than the current state-of-theart method. We achieve results comparable to the best ones reported, which were obtained on a cluster, using four times less data and more than an order of magnitude less computing time. We also investigate several model types and find that the embeddings learned by the simpler models perform at least as well as those learned by the more complex ones.


d9d4f495e875a2e075a1a4a6e1b9770f-Reviews.html

Neural Information Processing Systems

"Reciprocally Coupled Local Estimators Implement Bayesian Information Integration Distributively" puts forward a new take on Bayesian integration of multimodal cues. Instead of assuming a special area in the brain, where evidence from various sensory cues is combined (as in Ma and all, 2006), the authors consider a scenario, whereby each area receiving direct afferent input from a single modality (i.e. In the example analysed by the authors, and under a number of suitable assumptions, the cue integration they observe in their networks is close to Bayes-optimal. Building up on work of Fung and all (2010), the authors derive theoretical predictions for the integration of information in reciprocally coupled ring attractors (CANNs), which they also confirm by simulations. The reader is led through the general steps of the analysis, while details are provided in the supplementary material.


Memoized Online Variational Inference for Dirichlet Process Mixture Models

Neural Information Processing Systems

Variational inference algorithms provide the most effective framework for largescale training of Bayesian nonparametric models. Stochastic online approaches are promising, but are sensitive to the chosen learning rate and often converge to poor local optima. We present a new algorithm, memoized online variational inference, which scales to very large (yet finite) datasets while avoiding the complexities of stochastic gradient. Our algorithm maintains finite-dimensional sufficient statistics from batches of the full dataset, requiring some additional memory but still scaling to millions of examples. Exploiting nested families of variational bounds for infinite nonparametric models, we develop principled birth and merge moves allowing non-local optimization. Births adaptively add components to the model to escape local optima, while merges remove redundancy and improve speed. Using Dirichlet process mixture models for image clustering and denoising, we demonstrate major improvements in robustness and accuracy.


Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies

Neural Information Processing Systems

Learning a visual concept from a small number of positive examples is a significant challenge for machine learning algorithms. Current methods typically fail to find the appropriate level of generalization in a concept hierarchy for a given set of visual examples. Recent work in cognitive science on Bayesian models of generalization addresses this challenge, but prior results assumed that objects were perfectly recognized. We present an algorithm for learning visual concepts directly from images, using probabilistic predictions generated by visual classifiers as the input to a Bayesian generalization model. As no existing challenge data tests this paradigm, we collect and make available a new, large-scale dataset for visual concept learning using the ImageNet hierarchy as the source of possible concepts, with human annotators to provide ground truth labels as to whether a new image is an instance of each concept using a paradigm similar to that used in experiments studying word learning in children. We compare the performance of our system to several baseline algorithms, and show a significant advantage results from combining visual classifiers with the ability to identify an appropriate level of abstraction using Bayesian generalization.


cbb6a3b884f4f88b3a8e3d44c636cbd8-Reviews.html

Neural Information Processing Systems

The authors study whether and when a hierarchical classifier can be more beneficial than its flat counterpart. They proof a generalization bound that provides an explanation when a flat and when a hierarchical classifier should be used. Additionally, the authors provide an approach for logistic regression and naive Bayes classifiers, which enables pruning of nodes in large-scale hierarchies. Quality: The authors consider a very interesting and up-to-date problem. Therefore I was very glad to read this paper. The first bound obtained by the authors is very interesting and indeed provides an explanation of existing empirical results.


Relevance Topic Model for Unstructured Social Group Activity Recognition

Neural Information Processing Systems

Unstructured social group activity recognition in web videos is a challenging task due to 1) the semantic gap between class labels and low-level visual features and 2) the lack of labeled training data. To tackle this problem, we propose a "relevance topic model" for jointly learning meaningful mid-level representations upon bagof-words (BoW) video representations and a classifier with sparse weights. In our approach, sparse Bayesian learning is incorporated into an undirected topic model (i.e., Replicated Softmax) to discover topics which are relevant to video classes and suitable for prediction. Rectified linear units are utilized to increase the expressive power of topics so as to explain better video data containing complex contents and make variational inference tractable for the proposed model. An efficient variational EM algorithm is presented for model parameter estimation and inference. Experimental results on the Unstructured Social Activity Attribute dataset show that our model achieves state of the art performance and outperforms other supervised topic model in terms of classification accuracy, particularly in the case of a very small number of labeled training videos.


c06d06da9666a219db15cf575aff2824-Reviews.html

Neural Information Processing Systems

REVIEWER 5: Yes, clarifying that we assume chordality is useful, and will revise the title, abstract and elsewhere to emphasize this assumption. REVIEWER 6: The reviewer's summary of the proof of Lemma 4 about the balancing condition is accurate. We may have been a bit pedantic in spelling out the details of the proof, but on the other hand, simply saying that the balancing condition "obviously" holds because of the running intersection property would not be very informative either, and we would rather err on the side of giving too much details rather than too little. The standard Bayesian approach we use for model learning is statistically consistent for choosing the correct dimensionality, since prior distribution assigned to model parameters acts as a regularizer. This property is so widely established in the literature that we did not consider it to be necessary to emphasize the aspect in the paper.