AITopics

Computer animated agents and robots bring a social dimension to human computer interaction and force us to think in new ways about how computers could be used in daily life. Face to face communication is a real-time process operating at a time scale of less than a second. In this paper we present progress on a perceptual primitive to automatically detect frontal faces in the video stream and code them with respect to 7 dimensions in real time: neutral, anger, disgust, fear, joy, sadness, surprise. The face finder employs a cascade of feature detectors trained with boosting techniques [13, 2]. The expression recognizer employs a novel combination of Adaboost and SVM's. The generalization performance to new subjects for a 7-way forced choice was 93.3% and 97% correct on two publicly available datasets. The outputs of the classifier change smoothly as a function of time, providing a potentially valuable representation to code facial expression dynamics in a fully automatic and unobtrusive manner. The system was deployed and evaluated for measuring spontaneous facial expressions in the field in an application for automatic assessment of human-robot interaction.

adaboost, expression, facial expression, (15 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.47)

Kumar, Sanjiv, Hebert, Martial

Discriminative Fields for Modeling Spatial Dependencies in Natural Images

In this paper we present Discriminative Random Fields (DRF), a discriminative framework for the classification of natural image regions by incorporating neighborhood spatial dependencies in the labels as well as the observed data. The proposed model exploits local discriminative models and allows to relax the assumption of conditional independence of the observed data given the labels, commonly used in the Markov Random Field (MRF) framework. The parameters of the DRF model are learned using penalized maximum pseudo-likelihood method. Furthermore, the form of the DRF model allows the MAP inference for binary classification problems using the graph min-cut algorithms. The performance of the model was verified on the synthetic as well as the real-world images. The DRF model outperforms the MRF model in the experiments.

drf model, experiment, interaction potential, (13 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Fink, Michael, Perona, Pietro

Mutual Boosting for Contextual Inference

Mutual Boosting is a method aimed at incorporating contextual information to augment object detection. When multiple detectors of objects and parts are trained in parallel using AdaBoost [1], object detectors might use the remaining intermediate detectors to enrich the weak learner set. This method generalizes the efficient features suggested by Viola and Jones [2] thus enabling information inference between parts and objects in a compositional hierarchy. In our experiments eye-, nose-, mouth-and face detectors are trained using the Mutual Boosting framework. Results show that the method outperforms applications overlooking contextual information. We suggest that achieving contextual integration is a step toward humanlike detection capabilities.

detection, detector, information, (15 more...)

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.05)
North America > United States > California > Los Angeles County > Pasadena (0.04)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.47)

Gruber, Amit, Weiss, Yair

Factorization with Uncertainty and Missing Data: Exploiting Temporal Coherence

The problem of "Structure From Motion" is a central problem in vision: given the 2D locations of certain points we wish to recover the camera motion and the 3D coordinates of the points. Under simplified camera models, the problem reduces to factorizing a measurement matrix into the product of two low rank matrices. Each element of the measurement matrix contains the position of a point in a particular image. When all elements are observed, the problem can be solved trivially using SVD, but in any realistic situation many elements of the matrix are missing and the ones that are observed have a different directional uncertainty. Under these conditions, most existing factorization algorithms fail while human perception is relatively unchanged. In this paper we use the well known EM algorithm for factor analysis to perform factorization. This allows us to easily handle missing data and measurement uncertainty and more importantly allows us to place a prior on the temporal trajectory of the latent variables (the camera position). We show that incorporating this prior gives a significant improvement in performance in challenging image sequences.

algorithm, matrix, sequence, (14 more...)

Country: Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.05)

Industry:

Media > Television (0.55)
Media > Photography (0.55)
Media > Film (0.55)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Data Science > Data Quality (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.36)

Murphy, Kevin P., Torralba, Antonio, Freeman, William T.

Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes

Standard approaches to object detection focus on local patches of the image, and try to classify them as background or not. We propose to use the scene context (image as a whole) as an extra source of (global) information, to help resolve local ambiguities. We present a conditional random field for jointly solving the tasks of object detection and scene classification.

computer vision, detection, detector, (11 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Vision (0.90)

Wang, Zhou, Simoncelli, Eero P.

Local Phase Coherence and the Perception of Blur

Blur is one of the most common forms of image distortion. It can arise from a variety of sources, such as atmospheric scatter, lens defocus, optical aberrations of the lens, and spatial and temporal sensor integration. Human observers are bothered by blur, and our visual systems are quite good at reporting whether an image appears blurred (or sharpened) [1, 2]. However, the mechanism by which this is accomplished is not well understood. Clearly, detection of blur requires some model of what constitutes an unblurred image. In recent years, there has been a surge of interest in the modelling of natural images, both for purposes of improving the performance of image processing and computer vision systems, and also for furthering our understanding of biological visual systems.

coefficient, coherence, phase coherence, (13 more...)

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Monterey County > Pacific Grove (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)

Industry: Media > Photography (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Data Science > Data Quality > Data Transformation (0.49)

Nakatani, Tomohiro, Miyoshi, Masato, Kinoshita, Keisuke

One Microphone Blind Dereverberation Based on Quasi-periodicity of Speech Signals

Speech dereverberation is desirable with a view to achieving, for example, robust speech recognition in the real world. However, it is still a challenging problem, especially when using a single microphone. Although blind equalization techniques have been exploited, they cannot deal with speech signals appropriately because their assumptions are not satisfied by speech signals. We propose a new dereverberation principle based on an inherent property of speech signals, namely quasi-periodicity. The present methods learn the dereverberation filter from a lot of speech data with no prior knowledge of the data, and can achieve high quality speech dereverberation especially when the reverberation time is long.

dereverberation operator, impulse response, speech signal, (13 more...)

Country:

Asia > Japan > Honshū > Tōhoku > Akita Prefecture > Akita (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.49)

Eigenvoice Speaker Adaptation via Composite Kernel Principal Component Analysis

Kwok, James T., Mak, Brian, Ho, Simon

Eigenvoice speaker adaptation has been shown to be effective when only a small amount of adaptation data is available. At the heart of the method is principal component analysis (PCA) employed to find the most important eigenvoices. In this paper, we postulate that nonlinear PCA, in particular kernel PCA, may be even more effective. One major challenge is to map the feature-space eigenvoices back to the observation space so that the state observation likelihoods can be computed during the estimation of eigenvoice weights and subsequent decoding. Our solution is to compute kernel PCA using composite kernels, and we will call our new method kernel eigenvoice speaker adaptation. On the TIDIGITS corpus, we found that compared with a speaker-independent model, our kernel eigenvoice adaptation method can reduce the word error rate by 28-33% while the standard eigenvoice approach can only match the performance of the speaker-independent model.

eigenvoice, eigenvoice weight, speaker supervector, (10 more...)

Country:

Asia > China > Hong Kong (0.05)
North America > United States > Wisconsin > Dane County > Madison (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.61)

Achan, Kannan, Roweis, Sam T., Frey, Brendan J.

Probabilistic Inference of Speech Signals from Phaseless Spectrograms

Many techniques for complex speech processing such as denoising and deconvolution, time/frequency warping, multiple speaker separation, and multiple microphone analysis operate on sequences of short-time power spectra (spectrograms), a representation which is often well-suited to these tasks. However, a significant problem with algorithms that manipulate spectrograms is that the output spectrogram does not include a phase component, which is needed to create a time-domain signal that has good perceptual quality. Here we describe a generative model of time-domain speech signals and their spectrograms, and show how an efficient optimizer can be used to find the maximum a posteriori speech signal, given the spectrogram.

algorithm, spectrogram, speech signal, (15 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
Asia > Middle East > Jordan (0.05)
North America > United States > Massachusetts > Plymouth County > Norwell (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.65)

Samejima, Kazuyuki, Doya, Kenji, Ueda, Yasumasa, Kimura, Minoru

Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter

When we model a higher order functions, such as learning and memory, we face a difficulty of comparing neural activities with hidden variables that depend on the history of sensory and motor signals and the dynamics of the network. Here, we propose novel method for estimating hidden variables of a learning agent, such as connection weights from sequences of observable variables. Bayesian estimation is a method to estimate the posterior probability of hidden variables from observable data sequence using a dynamic model of hidden and observable variables. In this paper, we apply particle filter for estimating internal parameters and metaparameters of a reinforcement learning model. We verified the effectiveness of the method using both artificial data and real animal behavioral data.

probability, reinforcement, sequence, (15 more...)