Directed Networks
Automatic Derivation of Statistical Algorithms: The EM Family and Beyond
Fischer, Bernd, Schumann, Johann, Buntine, Wray, Gray, Alexander G.
Machine learning has reached a point where many probabilistic methods canbe understood as variations, extensions and combinations of a much smaller set of abstract themes, e.g., as different instances of the EM algorithm. This enables the systematic derivation of algorithms customized fordifferent models. Here, we describe the AUTOBAYES system which takes a high-level statistical model specification, uses powerful symbolic techniques based on schema-based program synthesis and computer algebra to derive an efficient specialized algorithm for learning that model, and generates executable code implementing that algorithm. This capability is far beyond that of code collections such as Matlab toolboxes oreven tools for model-independent optimization such as BUGS for Gibbs sampling: complex new algorithms can be generated without newprogramming, algorithms can be highly specialized and tightly crafted for the exact structure of the model and data, and efficient and commented code can be generated for different languages or systems.
Interpreting Neural Response Variability as Monte Carlo Sampling of the Posterior
Hoyer, Patrik O., Hyvรคrinen, Aapo
The responses of cortical sensory neurons are notoriously variable, with the number of spikes evoked by identical stimuli varying significantly from trial to trial. This variability is most often interpreted as'noise', purely detrimental to the sensory system. In this paper, we propose an alternative viewin which the variability is related to the uncertainty, about world parameters, which is inherent in the sensory stimulus. Specifically, theresponses of a population of neurons are interpreted as stochastic samples from the posterior distribution in a latent variable model. In addition to giving theoretical arguments supporting such a representational scheme,we provide simulations suggesting how some aspects of response variability might be understood in this framework.
Learning with Multiple Labels
In this paper, we study a special kind of learning problem in which each training instance is given a set of (or distribution over) candidate class labels and only one of the candidate labels is the correct one. Such a problem can occur, e.g., in an information retrieval setting where a set of words is associated with an image, or if classes labels are organized hierarchically. We propose a novel discriminative approach for handling the ambiguity of class labels in the training examples. The experiments with the proposed approach over five different UCI datasets show that our approach is able to find the correct label among the set of candidate labels and actually achieve performance close to the case when each training instance is given a single correct label. In contrast, naIve methods degrade rapidly as more ambiguity is introduced into the labels. 1 Introduction Supervised and unsupervised learning problems have been extensively studied in the machine learning literature. In supervised classification each training instance is associated with a single class label, while in unsupervised classification (i.e.
A Model for Learning Variance Components of Natural Images
Karklin, Yan, Lewicki, Michael S.
We present a hierarchical Bayesian model for learning efficient codes of higher-order structure in natural images. The model, a nonlinear generalization ofindependent component analysis, replaces the standard assumption of independence for the joint distribution of coefficients with a distribution that is adapted to the variance structure of the coefficients of an efficient image basis. This offers a novel description of higherorder imagestructure and provides a way to learn coarse-coded, sparsedistributed representationsof abstract image properties such as object location, scale, and texture.
Learning Sparse Topographic Representations with Products of Student-t Distributions
Welling, Max, Osindero, Simon, Hinton, Geoffrey E.
We propose a model for natural images in which the probability of an image isproportional to the product of the probabilities of some filter outputs. Weencourage the system to find sparse features by using a Studentt distribution to model each filter output. If the t-distribution is used to model the combined outputs of sets of neurally adjacent filters, the system learnsa topographic map in which the orientation, spatial frequency and location of the filters change smoothly across the map. Even though maximum likelihood learning is intractable in our model, the product form allows a relatively efficient learning procedure that works well even for highly overcomplete sets of filters. Once the model has been learned it can be used as a prior to derive the "iterated Wiener filter" for the purpose ofdenoising images.
Learning Sparse Multiscale Image Representations
Sallee, Phil, Olshausen, Bruno A.
We describe a method for learning sparse multiscale image representations usinga sparse prior distribution over the basis function coefficients. The prior consists of a mixture of a Gaussian and a Dirac delta function, and thus encourages coefficients to have exact zero values. Coefficients for an image are computed by sampling from the resulting posterior distribution with a Gibbs sampler. The learned basis is similar to the Steerable Pyramid basis, and yields slightly higher SNR for the same number of active coefficients. Denoising usingthe learned image model is demonstrated for some standard test images, with results that compare favorably with other denoising methods.
Dynamic Structure Super-Resolution
The problem of super-resolution involves generating feasible higher resolution images, which are pleasing to the eye and realistic, from a given low resolution image. This might be attempted by using simplefilters for smoothing out the high resolution blocks or through applications where substantial prior information is used to imply the textures and shapes which will occur in the images. In this paper we describe an approach which lies between the two extremes. It is a generic unsupervised method which is usable in all domains, but goes beyond simple smoothing methods in what it achieves. We use a dynamic treelike architecture to model the high resolution data. Approximate conditioning on the low resolution image is achieved through a mean field approach.
Bayesian Image Super-Resolution
Tipping, Michael E., Bishop, Christopher M.
The extraction of a single high-quality image from a set of lowresolution imagesis an important problem which arises in fields such as remote sensing, surveillance, medical imaging and the extraction ofstill images from video. Typical approaches are based on the use of cross-correlation to register the images followed by the inversion of the transformation from the unknown high resolution imageto the observed low resolution images, using regularization toresolve the ill-posed nature of the inversion process. In this paper we develop a Bayesian treatment of the super-resolution problem in which the likelihood function for the image registration parametersis based on a marginalization over the unknown high-resolution image. This approach allows us to estimate the unknown point spread function, and is rendered tractable through the introduction of a Gaussian process prior over images. Results indicate a significant improvement over techniques based on MAP (maximum a-posteriori) point optimization of the high resolution image and associated registration parameters. 1 Introduction The task in super-resolution is to combine a set of low resolution images of the same scene in order to obtain a single image of higher resolution. Provided the individual low resolution images have sub-pixel displacements relative to each other, it is possible to extract high frequency details of the scene well beyond the Nyquist limit of the individual source images.
Learning Graphical Models with Mercer Kernels
Bach, Francis R., Jordan, Michael I.
We present a class of algorithms for learning the structure of graphical models from data. The algorithms are based on a measure known as the kernel generalized variance (KGV), which essentially allows us to treat all variables on an equal footing as Gaussians in a feature space obtained from Mercer kernels. Thus we are able to learn hybrid graphs involving discrete and continuous variables of arbitrary type. We explore the computational properties of our approach, showing how to use the kernel trick to compute the relevant statistics in linear time. We illustrate our framework with experiments involving discrete and continuous data.