Bayesian Inference
Interpreting Neural Response Variability as Monte Carlo Sampling of the Posterior
Hoyer, Patrik O., Hyvรคrinen, Aapo
The responses of cortical sensory neurons are notoriously variable, with the number of spikes evoked by identical stimuli varying significantly from trial to trial. This variability is most often interpreted as'noise', purely detrimental to the sensory system. In this paper, we propose an alternative viewin which the variability is related to the uncertainty, about world parameters, which is inherent in the sensory stimulus. Specifically, theresponses of a population of neurons are interpreted as stochastic samples from the posterior distribution in a latent variable model. In addition to giving theoretical arguments supporting such a representational scheme,we provide simulations suggesting how some aspects of response variability might be understood in this framework.
Learning with Multiple Labels
In this paper, we study a special kind of learning problem in which each training instance is given a set of (or distribution over) candidate class labels and only one of the candidate labels is the correct one. Such a problem can occur, e.g., in an information retrieval setting where a set of words is associated with an image, or if classes labels are organized hierarchically. We propose a novel discriminative approach for handling the ambiguity of class labels in the training examples. The experiments with the proposed approach over five different UCI datasets show that our approach is able to find the correct label among the set of candidate labels and actually achieve performance close to the case when each training instance is given a single correct label. In contrast, naIve methods degrade rapidly as more ambiguity is introduced into the labels. 1 Introduction Supervised and unsupervised learning problems have been extensively studied in the machine learning literature. In supervised classification each training instance is associated with a single class label, while in unsupervised classification (i.e.
A Model for Learning Variance Components of Natural Images
Karklin, Yan, Lewicki, Michael S.
We present a hierarchical Bayesian model for learning efficient codes of higher-order structure in natural images. The model, a nonlinear generalization ofindependent component analysis, replaces the standard assumption of independence for the joint distribution of coefficients with a distribution that is adapted to the variance structure of the coefficients of an efficient image basis. This offers a novel description of higherorder imagestructure and provides a way to learn coarse-coded, sparsedistributed representationsof abstract image properties such as object location, scale, and texture.
Learning Sparse Topographic Representations with Products of Student-t Distributions
Welling, Max, Osindero, Simon, Hinton, Geoffrey E.
We propose a model for natural images in which the probability of an image isproportional to the product of the probabilities of some filter outputs. Weencourage the system to find sparse features by using a Studentt distribution to model each filter output. If the t-distribution is used to model the combined outputs of sets of neurally adjacent filters, the system learnsa topographic map in which the orientation, spatial frequency and location of the filters change smoothly across the map. Even though maximum likelihood learning is intractable in our model, the product form allows a relatively efficient learning procedure that works well even for highly overcomplete sets of filters. Once the model has been learned it can be used as a prior to derive the "iterated Wiener filter" for the purpose ofdenoising images.
Learning Sparse Multiscale Image Representations
Sallee, Phil, Olshausen, Bruno A.
We describe a method for learning sparse multiscale image representations usinga sparse prior distribution over the basis function coefficients. The prior consists of a mixture of a Gaussian and a Dirac delta function, and thus encourages coefficients to have exact zero values. Coefficients for an image are computed by sampling from the resulting posterior distribution with a Gibbs sampler. The learned basis is similar to the Steerable Pyramid basis, and yields slightly higher SNR for the same number of active coefficients. Denoising usingthe learned image model is demonstrated for some standard test images, with results that compare favorably with other denoising methods.
Bayesian Image Super-Resolution
Tipping, Michael E., Bishop, Christopher M.
The extraction of a single high-quality image from a set of lowresolution imagesis an important problem which arises in fields such as remote sensing, surveillance, medical imaging and the extraction ofstill images from video. Typical approaches are based on the use of cross-correlation to register the images followed by the inversion of the transformation from the unknown high resolution imageto the observed low resolution images, using regularization toresolve the ill-posed nature of the inversion process. In this paper we develop a Bayesian treatment of the super-resolution problem in which the likelihood function for the image registration parametersis based on a marginalization over the unknown high-resolution image. This approach allows us to estimate the unknown point spread function, and is rendered tractable through the introduction of a Gaussian process prior over images. Results indicate a significant improvement over techniques based on MAP (maximum a-posteriori) point optimization of the high resolution image and associated registration parameters. 1 Introduction The task in super-resolution is to combine a set of low resolution images of the same scene in order to obtain a single image of higher resolution. Provided the individual low resolution images have sub-pixel displacements relative to each other, it is possible to extract high frequency details of the scene well beyond the Nyquist limit of the individual source images.
Learning Graphical Models with Mercer Kernels
Bach, Francis R., Jordan, Michael I.
We present a class of algorithms for learning the structure of graphical models from data. The algorithms are based on a measure known as the kernel generalized variance (KGV), which essentially allows us to treat all variables on an equal footing as Gaussians in a feature space obtained from Mercer kernels. Thus we are able to learn hybrid graphs involving discrete and continuous variables of arbitrary type. We explore the computational properties of our approach, showing how to use the kernel trick to compute the relevant statistics in linear time. We illustrate our framework with experiments involving discrete and continuous data.
Discriminative Learning for Label Sequences via Boosting
Altun, Yasemin, Hofmann, Thomas, Johnson, Mark
Well-known applications include part-of-speech (POS) tagging, named entity classification, information extraction,text segmentation and phoneme classification in text and speech processing [7] as well as problems like protein homology detection, secondary structure prediction or gene classification in computational biology [3]. Up to now, the predominant formalism for modeling and predicting label sequences has been based on Hidden Markov Models (HMMs) and variations thereof. Yet, despite its success, generative probabilistic models - of which HMMs are a special case - have two major shortcomings, which this paper is not the first one to point out. First, generative probabilistic models are typically trained using maximum likelihood estimation (MLE) for a joint sampling model of observation and label sequences. As has been emphasized frequently, MLE based on the joint probability model is inherently non-discriminative and thus may lead to suboptimal prediction accuracy. Secondly, efficient inference and learning in this setting often requires to make questionable conditional independence assumptions.
VIBES: A Variational Inference Engine for Bayesian Networks
Bishop, Christopher M., Spiegelhalter, David, Winn, John
In recent years variational methods have become a popular tool for approximate inference and learning in a wide variety of probabilistic models.For each new application, however, it is currently necessary first to derive the variational update equations, and then to implement them in application-specific code. Each of these steps is both time consuming and error prone. In this paper we describe a general purpose inference engine called VIBES ('Variational Inference forBayesian Networks') which allows a wide variety of probabilistic modelsto be implemented and solved variationally without recourse to coding. New models are specified either through a simple script or via a graphical interface analogous to a drawing package. VIBES then automatically generates and solves the variational equations.We illustrate the power and flexibility of VIBES using examples from Bayesian mixture modelling.
Regularized Greedy Importance Sampling
Southey, Finnegan, Schuurmans, Dale, Ghodsi, Ali
Greedy importance sampling is an unbiased estimation technique that reduces thevariance of standard importance sampling by explicitly searching for modes in the estimation objective. Previous work has demonstrated thefeasibility of implementing this method and proved that the technique is unbiased in both discrete and continuous domains. In this paper we present a reformulation of greedy importance sampling that eliminates the free parameters from the original estimator, and introduces a new regularization strategy that further reduces variance without compromising unbiasedness.The resulting estimator is shown to be effective for difficult estimation problems arising in Markov random field inference. Inparticular, improvements are achieved over standard MCMC estimators when the distribution has multiple peaked modes.