Country
Learning Sparse Multiscale Image Representations
Sallee, Phil, Olshausen, Bruno A.
We describe a method for learning sparse multiscale image representations usinga sparse prior distribution over the basis function coefficients. The prior consists of a mixture of a Gaussian and a Dirac delta function, and thus encourages coefficients to have exact zero values. Coefficients for an image are computed by sampling from the resulting posterior distribution with a Gibbs sampler. The learned basis is similar to the Steerable Pyramid basis, and yields slightly higher SNR for the same number of active coefficients. Denoising usingthe learned image model is demonstrated for some standard test images, with results that compare favorably with other denoising methods.
Dynamic Structure Super-Resolution
The problem of super-resolution involves generating feasible higher resolution images, which are pleasing to the eye and realistic, from a given low resolution image. This might be attempted by using simplefilters for smoothing out the high resolution blocks or through applications where substantial prior information is used to imply the textures and shapes which will occur in the images. In this paper we describe an approach which lies between the two extremes. It is a generic unsupervised method which is usable in all domains, but goes beyond simple smoothing methods in what it achieves. We use a dynamic treelike architecture to model the high resolution data. Approximate conditioning on the low resolution image is achieved through a mean field approach.
A Bilinear Model for Sparse Coding
Grimes, David B., Rao, Rajesh P. N.
Recent algorithms for sparse coding and independent component analysis (ICA)have demonstrated how localized features can be learned from natural images. However, these approaches do not take image transformations intoaccount. As a result, they produce image codes that are redundant because the same feature is learned at multiple locations. We describe an algorithm for sparse coding based on a bilinear generative model of images. By explicitly modeling the interaction between image featuresand their transformations, the bilinear approach helps reduce redundancy in the image code and provides a basis for transformationinvariant vision.We present results demonstrating bilinear sparse coding of natural images. We also explore an extension of the model that can capture spatial relationships between the independent features of an object, therebyproviding a new framework for parts-based object recognition.
Bayesian Image Super-Resolution
Tipping, Michael E., Bishop, Christopher M.
The extraction of a single high-quality image from a set of lowresolution imagesis an important problem which arises in fields such as remote sensing, surveillance, medical imaging and the extraction ofstill images from video. Typical approaches are based on the use of cross-correlation to register the images followed by the inversion of the transformation from the unknown high resolution imageto the observed low resolution images, using regularization toresolve the ill-posed nature of the inversion process. In this paper we develop a Bayesian treatment of the super-resolution problem in which the likelihood function for the image registration parametersis based on a marginalization over the unknown high-resolution image. This approach allows us to estimate the unknown point spread function, and is rendered tractable through the introduction of a Gaussian process prior over images. Results indicate a significant improvement over techniques based on MAP (maximum a-posteriori) point optimization of the high resolution image and associated registration parameters. 1 Introduction The task in super-resolution is to combine a set of low resolution images of the same scene in order to obtain a single image of higher resolution. Provided the individual low resolution images have sub-pixel displacements relative to each other, it is possible to extract high frequency details of the scene well beyond the Nyquist limit of the individual source images.
A Prototype for Automatic Recognition of Spontaneous Facial Actions
Bartlett, M.S., Littlewort, G.C., Sejnowski, T.J., Movellan, J.R.
Spontaneous facial expressions differ substantially from posed expressions, similar to how continuous, spontaneous speech differs from isolated words produced on command. Previous methods for automatic facial expression recognition assumed images were collected in controlled environments in which the subjects deliberately facedthe camera. Since people often nod or turn their heads, automatic recognition of spontaneous facial behavior requires methods for handling out-of-image-plane head rotations. Here we explore an approach basedon 3-D warping of images into canonical views. We evaluated the performance of the approach as a front-end for a spontaneous expression recognition system using support vector machines and hidden Markov models. This system employed general purpose learning mechanisms thatcan be applied to recognition of any facial movement. The system was tested for recognition of a set of facial actions defined by the Facial Action Coding System (FACS). We showed that 3D tracking and warping followed by machine learning techniques directly applied to the warped images, is a viable and promising technology for automatic facial expression recognition. One exciting aspect of the approach presented hereis that information about movement dynamics emerged out of filters which were derived from the statistics of images.
Fast Transformation-Invariant Factor Analysis
Kannan, Anitha, Jojic, Nebojsa, Frey, Brendan
Dimensionality reduction techniques such as principal component analysis andfactor analysis are used to discover a linear mapping between high dimensional data samples and points in a lower dimensional subspace. In [6], Jojic and Frey introduced mixture of transformation-invariant component analyzers (MTCA) that can account for global transformations suchas translations and rotations, perform clustering and learn local appearance deformations by dimensionality reduction.
Learning to Detect Natural Image Boundaries Using Brightness and Texture
Martin, David R., Fowlkes, Charless C., Malik, Jitendra
The goal of this work is to accurately detect and localize boundaries in natural scenes using local image measurements. We formulate features that respond to characteristic changes in brightness and texture associated with natural boundaries. In order to combine the information from these features in an optimal way, a classifier is trained using human labeled images as ground truth. We present precision-recall curves showing that the resulting detector outperforms existing approaches.
Monaural Speech Separation
Monaural speech separation has been studied in previous systems that incorporate auditory scene analysis principles. A major problem for these systems is their inability to deal with speech in the highfrequency range.Psychoacoustic evidence suggests that different perceptual mechanisms are involved in handling resolved and unresolved harmonics. Motivated by this, we propose a model for monaural separation that deals with low-frequency and highfrequency signalsdifferently. For resolved harmonics, our model generates segments based on temporal continuity and cross-channel correlation, and groups them according to periodicity. For unresolved harmonics, the model generates segments based on amplitude modulation (AM) in addition to temporal continuity and groups them according to AM repetition rates derived from sinusoidal modeling. Underlying the separation process is a pitch contour obtained according to psychoacoustic constraints. Our model is systematically evaluated, and it yields substantially better performance than previous systems, especially in the high-frequency range.
An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition
This paper presents a novel Hidden Markov Model architecture to model the joint probability of pairs of asynchronous sequences describing thesame event. It is based on two other Markovian models, namely Asynchronous Input/ Output Hidden Markov Models and Pair Hidden Markov Models. An EM algorithm to train the model is presented, as well as a Viterbi decoder that can be used to obtain theoptimal state sequence as well as the alignment between the two sequences. The model has been tested on an audiovisual speech recognition task using the M2VTS database and yielded robust performances under various noise conditions. 1 Introduction Hidden Markov Models (HMMs) are statistical tools that have been used successfully inthe last 30 years to model difficult tasks such as speech recognition [6) or biological sequence analysis [4). They are very well suited to handle discrete of continuous sequencesof varying sizes.