Goto

Collaborating Authors

 Bayesian Learning


Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies

Neural Information Processing Systems

Learning a visual concept from a small number of positive examples is a significant challenge for machine learning algorithms. Current methods typically fail to find the appropriate level of generalization in a concept hierarchy for a given set of visual examples. Recent work in cognitive science on Bayesian models of generalization addresses this challenge, but prior results assumed that objects were perfectly recognized. We present an algorithm for learning visual concepts directly from images, using probabilistic predictions generated by visual classifiers as the input to a Bayesian generalization model. As no existing challenge data tests this paradigm, we collect and make available a new, large-scale dataset for visual concept learning using the ImageNet hierarchy as the source of possible concepts, with human annotators to provide ground truth labels as to whether a new image is an instance of each concept using a paradigm similar to that used in experiments studying word learning in children. We compare the performance of our system to several baseline algorithms, and show a significant advantage results from combining visual classifiers with the ability to identify an appropriate level of abstraction using Bayesian generalization.


cbb6a3b884f4f88b3a8e3d44c636cbd8-Reviews.html

Neural Information Processing Systems

The authors study whether and when a hierarchical classifier can be more beneficial than its flat counterpart. They proof a generalization bound that provides an explanation when a flat and when a hierarchical classifier should be used. Additionally, the authors provide an approach for logistic regression and naive Bayes classifiers, which enables pruning of nodes in large-scale hierarchies. Quality: The authors consider a very interesting and up-to-date problem. Therefore I was very glad to read this paper. The first bound obtained by the authors is very interesting and indeed provides an explanation of existing empirical results.


Relevance Topic Model for Unstructured Social Group Activity Recognition

Neural Information Processing Systems

Unstructured social group activity recognition in web videos is a challenging task due to 1) the semantic gap between class labels and low-level visual features and 2) the lack of labeled training data. To tackle this problem, we propose a "relevance topic model" for jointly learning meaningful mid-level representations upon bagof-words (BoW) video representations and a classifier with sparse weights. In our approach, sparse Bayesian learning is incorporated into an undirected topic model (i.e., Replicated Softmax) to discover topics which are relevant to video classes and suitable for prediction. Rectified linear units are utilized to increase the expressive power of topics so as to explain better video data containing complex contents and make variational inference tractable for the proposed model. An efficient variational EM algorithm is presented for model parameter estimation and inference. Experimental results on the Unstructured Social Activity Attribute dataset show that our model achieves state of the art performance and outperforms other supervised topic model in terms of classification accuracy, particularly in the case of a very small number of labeled training videos.


c06d06da9666a219db15cf575aff2824-Reviews.html

Neural Information Processing Systems

REVIEWER 5: Yes, clarifying that we assume chordality is useful, and will revise the title, abstract and elsewhere to emphasize this assumption. REVIEWER 6: The reviewer's summary of the proof of Lemma 4 about the balancing condition is accurate. We may have been a bit pedantic in spelling out the details of the proof, but on the other hand, simply saying that the balancing condition "obviously" holds because of the running intersection property would not be very informative either, and we would rather err on the side of giving too much details rather than too little. The standard Bayesian approach we use for model learning is statistically consistent for choosing the correct dimensionality, since prior distribution assigned to model parameters acts as a regularizer. This property is so widely established in the literature that we did not consider it to be necessary to emphasize the aspect in the paper.


Learning Chordal Markov Networks by Constraint Satisfaction University of Helsinki Aalto University Aalto University Åbo Akademi University Finland Finland Finland Finland Johan Pensar

Neural Information Processing Systems

We investigate the problem of learning the structure of a Markov network from data. It is shown that the structure of such networks can be described in terms of constraints which enables the use of existing solver technology with optimization capabilities to compute optimal networks starting from initial scores computed from the data. To achieve efficient encodings, we develop a novel characterization of Markov network structure using a balancing condition on the separators between cliques forming the network. The resulting translations into propositional satisfiability and its extensions such as maximum satisfiability, satisfiability modulo theories, and answer set programming, enable us to prove optimal certain networks which have been previously found by stochastic search.


Analyzing Hogwild Parallel Gaussian Gibbs Sampling

Neural Information Processing Systems

Sampling inference methods are computationally difficult to scale for many models in part because global dependencies can reduce opportunities for parallel computation. Without strict conditional independence structure among variables, standard Gibbs sampling theory requires sample updates to be performed sequentially, even if dependence between most variables is not strong. Empirical work has shown that some models can be sampled effectively by going "Hogwild" and simply running Gibbs updates in parallel with only periodic global communication, but the successes and limitations of such a strategy are not well understood. As a step towards such an understanding, we study the Hogwild Gibbs sampling strategy in the context of Gaussian distributions. We develop a framework which provides convergence conditions and error bounds along with simple proofs and connections to methods in numerical linear algebra. In particular, we show that if the Gaussian precision matrix is generalized diagonally dominant, then any Hogwild Gibbs sampler, with any update schedule or allocation of variables to processors, yields a stable sampling process with the correct sample mean.


Flexible sampling of discrete data correlations without the marginal distributions

Neural Information Processing Systems

Learning the joint dependence of discrete variables is a fundamental problem in machine learning, with many applications including prediction, clustering and dimensionality reduction. More recently, the framework of copula modeling has gained popularity due to its modular parameterization of joint distributions. Among other properties, copulas provide a recipe for combining flexible models for univariate marginal distributions with parametric families suitable for potentially high dimensional dependence structures. More radically, the extended rank likelihood approach of Hoff (2007) bypasses learning marginal models completely when such information is ancillary to the learning task at hand as in, e.g., standard dimensionality reduction problems or copula parameter estimation. The main idea is to represent data by their observable rank statistics, ignoring any other information from the marginals. Inference is typically done in a Bayesian framework with Gaussian copulas, and it is complicated by the fact this implies sampling within a space where the number of constraints increases quadratically with the number of data points. The result is slow mixing when using off-the-shelf Gibbs sampling. We present an efficient algorithm based on recent advances on constrained Hamiltonian Markov chain Monte Carlo that is simple to implement and does not require paying for a quadratic cost in sample size.


Spectral methods for neural characterization using generalized quadratic models Il Memming Park 123, Evan Archer 13, & Jonathan W. Pillow

Neural Information Processing Systems

We describe a set of fast, tractable methods for characterizing neural responses to high-dimensional sensory stimuli using a model we refer to as the generalized quadratic model (GQM). The GQM consists of a low-rank quadratic function followed by a point nonlinearity and exponential-family noise. The quadratic function characterizes the neuron's stimulus selectivity in terms of a set linear receptive fields followed by a quadratic combination rule, and the invertible nonlinearity maps this output to the desired response range.


Machine Teaching for Bayesian Learners in the Exponential Family

Neural Information Processing Systems

What if there is a teacher who knows the learning goal and wants to design good training data for a machine learner? We propose an optimal teaching framework aimed at learners who employ Bayesian models. Our framework is expressed as an optimization problem over teaching examples that balance the future loss of the learner and the effort of the teacher. This optimization problem is in general hard. In the case where the learner employs conjugate exponential family models, we present an approximate algorithm for finding the optimal teaching set.


Real-Time Inference for a Gamma Process Model of Neural Spiking David Carlson, 2 Lawrence Carin

Neural Information Processing Systems

With simultaneous measurements from ever increasing populations of neurons, there is a growing need for sophisticated tools to recover signals from individual neurons. In electrophysiology experiments, this classically proceeds in a two-step process: (i) threshold the waveforms to detect putative spikes and (ii) cluster the waveforms into single units (neurons). We extend previous Bayesian nonparametric models of neural spiking to jointly detect and cluster neurons using a Gamma process model. Importantly, we develop an online approximate inference scheme enabling real-time analysis, with performance exceeding the previous state-of-theart. Via exploratory data analysis--using data with partial ground truth as well as two novel data sets--we find several features of our model collectively contribute to our improved performance including: (i) accounting for colored noise, (ii) detecting overlapping spikes, (iii) tracking waveform dynamics, and (iv) using multiple channels. We hope to enable novel experiments simultaneously measuring many thousands of neurons and possibly adapting stimuli dynamically to probe ever deeper into the mysteries of the brain.