AITopics

These methods are compared on their performance on a visual speech recognition task. While the representations developed are specific to visual speech recognition, the methods themselves are general purpose and applicable to other tasks. Our focus is on low-level data-driven methods based on the statistical properties of relatively untouched images, as opposed to approaches that work with contours or highly processed versions of the image. Padgett [8] and Bartlett [1] systematically studied statistical methods for developing representations on expression recognition tasks. They found that local wavelet-like representations consistently outperformed global representations, like eigenfaces. In this paper we also compare local versus global representations.

artificial intelligence, representation, speech recognition, (17 more...)

Country: North America > United States > California (0.30)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.90)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)

Coughlan, James M., Yuille, Alan L.

The Manhattan World Assumption: Regularities in Scene Statistics which Enable Bayesian Inference

Our focus, however, is on the discovery of scene statistics which are useful for solving visual inference problems. For example, in related work [5] we have analyzed the statistics of filter responses on and off edges and hence derived effective edge detectors.

artificial intelligence, bayesian inference, orientation, (15 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.15)
North America > United States > Colorado (0.14)

Genre: Research Report (0.47)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.85)

Slaney, Malcolm, Covell, Michele

FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks

FaceSync is an optimal linear algorithm that finds the degree of synchronization between the audio and image recordings of a human speaker. Using canonical correlation, it finds the best direction to combine all the audio and image data, projecting them onto a single axis. FaceSync uses Pearson's correlation to measure the degree of synchronization between the audio and image data. We derive the optimal linear transform to combine the audio and visual information and describe an implementation that avoids the numerical problems caused by computing the correlation matrices.

artificial intelligence, correlation, machine learning, (18 more...)

Country:

Europe (0.28)
North America > United States > California (0.14)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Olshausen, Bruno A., Sallee, Phil, Lewicki, Michael S.

Learning Sparse Image Codes using a Wavelet Pyramid Architecture

We show how a wavelet basis may be adapted to best represent natural images in terms of sparse coefficients. The wavelet basis, which may be either complete or overcomplete, is specified by a small number of spatial functions which are repeated across space and combined in a recursive fashion so as to be self-similar across scale. These functions are adapted to minimize the estimated code length under a model that assumes images are composed of a linear superposition of sparse, independent components. When adapted to natural images, the wavelet bases take on different orientations and they evenly tile the orientation domain, in stark contrast to the standard, non-oriented wavelet bases used in image compression. When the basis set is allowed to be overcomplete, it also yields higher coding efficiency than standard wavelet bases. 1 Introduction The general problem we address here is that of learning efficient codes for representing natural images.

artificial intelligence, coefficient, machine learning, (15 more...)

Country: North America > United States (0.15)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.70)

Gray, Michael S., Sejnowski, Terrence J., Movellan, Javier R.

A Comparison of Image Processing Techniques for Visual Speech Recognition Applications

These methods are compared on their performance on a visual speech recognition task. While the representations developed are specific to visual speech recognition, the methods themselvesare general purpose and applicable to other tasks. Our focus is on low-level data-driven methods based on the statistical properties of relatively untouched images, as opposed to approaches that work with contours or highly processed versions of the image. Padgett [8] and Bartlett [1] systematically studied statistical methods for developing representations on expression recognition tasks. They found that local wavelet-like representations consistently outperformed global representations, like eigenfaces. In this paper we also compare local versus global representations.

artificial intelligence, representation, speech recognition, (16 more...)

Country: North America > United States > California (0.30)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.90)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Wong, K. Y. Michael, Nishimori, Hidetoshi

Stagewise Processing in Error-correcting Codes and Image Restoration

Both mean-field analysis using the cavity method and simulations showthat it has the advantage of being robust against uncertainties in hyperparameter estimation. 1 Introduction In error-correcting codes [1] and image restoration [2], the choice of the so-called hyperparameters is an important factor in determining their performances.

artificial intelligence, machine learning, selective freezing, (15 more...)

Country:

Asia > China > Hong Kong (0.15)
Asia > Japan > Honshū (0.14)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.87)
Information Technology > Artificial Intelligence > Machine Learning (0.68)

Slaney, Malcolm, Covell, Michele

FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks

FaceSync is an optimal linear algorithm that finds the degree of synchronization betweenthe audio and image recordings of a human speaker. Using canonical correlation, it finds the best direction to combine allthe audio and image data, projecting them onto a single axis. FaceSync uses Pearson's correlation to measure the degree of synchronization betweenthe audio and image data. We derive the optimal linear transform to combine the audio and visual information and describe an implementation that avoids the numerical problems caused by computing thecorrelation matrices.

artificial intelligence, correlation, machine learning, (18 more...)

Country:

Europe (0.28)
North America > United States > California (0.14)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.85)

Coughlan, James M., Yuille, Alan L.

The Manhattan World Assumption: Regularities in Scene Statistics which Enable Bayesian Inference

artificial intelligence, bayesian inference, orientation, (15 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.15)
North America > United States > Colorado (0.14)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.84)
Information Technology > Sensing and Signal Processing > Image Processing (0.69)

Yang, Zhiyong, Zemel, Richard S.

Managing Uncertainty in Cue Combination

Neural Information Processing SystemsDec-31-2000

We develop a hierarchical generative model to study cue combination. The model maps a global shape parameter to local cuespecific parameters, which in tum generate an intensity image. Inferring shape from images is achieved by inverting this model. Inference produces a probability distribution at each level; using distributions rather than a single value of underlying variables at each stage preserves information about the validity of each local cue for the given image. This allows the model, unlike standard combination models, to adaptively weight each cue based on general cue reliability and specific image context.

artificial intelligence, machine learning, representation, (18 more...)

Country: North America > United States > Arizona (0.14)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Wainwright, Martin J., Simoncelli, Eero P.

Scale Mixtures of Gaussians and the Statistics of Natural Images

Neural Information Processing SystemsDec-31-2000

The statistics of photographic images, when represented using multiscale (wavelet) bases, exhibit two striking types of non Gaussian behavior. First, the marginal densities of the coefficients have extended heavy tails. Second, the joint densities exhibit variance dependencies not captured by second-order models. We examine properties of the class of Gaussian scale mixtures, and show that these densities can accurately characterize both the marginal and joint distributions of natural image wavelet coefficients. This class of model suggests a Markov structure, in which wavelet coefficients are linked by hidden scaling variables corresponding to local image structure. We derive an estimator for these hidden variables, and show that a nonlinear "normalization" procedure can be used to Gaussianize the coefficients.

artificial intelligence, machine learning, wavelet coefficients, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York (0.14)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.70)
Information Technology > Artificial Intelligence (0.69)