AITopics

Blur is one of the most common forms of image distortion.

artificial intelligence, coefficient, data quality, (16 more...)

Country: North America > United States > New York (0.14)

Industry: Media > Photography (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Data Science > Data Quality > Data Transformation (0.49)

Roman, Nicoleta, Wang, Deliang, Brown, Guy J.

A Classification-based Cocktail-party Processor

At a cocktail party, a listener can selectively attend to a single voice and filter out other acoustical interferences. How to simulate this perceptual ability remains a great challenge. This paper describes a novel supervised learning approach to speech segregation, in which a target speech signal is separated from interfering sounds using spatial location cues: interaural time differences (ITD) and interaural intensity differences (IID). Motivated by the auditory masking effect, we employ the notion of an ideal time-frequency binary mask, which selects the target if it is stronger than the interference in a local time-frequency unit. Within a narrow frequency band, modifications to the relative strength of the target source with respect to the interference trigger systematic changes for estimated ITD and IID.

artificial intelligence, interference, machine learning, (16 more...)

Country:

Europe > United Kingdom (0.28)
North America > United States > Ohio (0.14)

Technology:

Information Technology > Artificial Intelligence > Speech (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.34)

Nakatani, Tomohiro, Miyoshi, Masato, Kinoshita, Keisuke

One Microphone Blind Dereverberation Based on Quasi-periodicity of Speech Signals

Speech dereverberation is desirable with a view to achieving, for example, robustspeech recognition in the real world. However, it is still a challenging problem,especially when using a single microphone. Although blind equalization techniques have been exploited, they cannot deal with speech signals appropriately because their assumptions are not satisfied by speech signals. We propose a new dereverberation principle based on an inherent property of speech signals, namely quasi-periodicity. The present methods learn the dereverberation filter from a lot of speech data with no prior knowledge of the data, and can achieve high quality speech dereverberation especially when the reverberation time is long.

artificial intelligence, dereverberation operator, speech signal, (13 more...)

Country: Asia > Japan (0.28)

Technology: Information Technology > Artificial Intelligence > Speech (1.00)

Eigenvoice Speaker Adaptation via Composite Kernel Principal Component Analysis

Kwok, James T., Mak, Brian, Ho, Simon

In recent years, there has been a lot of interest in the study of kernel methods [1].

artificial intelligence, eigenvoice, machine learning, (14 more...)

Country: North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.41)

Achan, Kannan, Roweis, Sam T., Frey, Brendan J.

Probabilistic Inference of Speech Signals from Phaseless Spectrograms

Figure 1: In the generative model, the spectrogram is obtained by taking overlapping windows of length n from the time-domain speech signal, and computing the energy spectrum.

artificial intelligence, machine learning, spectrogram, (16 more...)

Country: North America > Canada > Ontario > Toronto (0.15)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.65)

Moreno, Pedro J., Ho, Purdy P., Vasconcelos, Nuno

A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications

Over the last years significant efforts have been made to develop kernels that can be applied to sequence data such as DNA, text, speech, video and images. The Fisher Kernel and similar variants have been suggested as good ways to combine an underlying generative model in the feature space and discriminant classifiers such as SVM's. In this paper we suggest analternative procedure to the Fisher kernel for systematically finding kernel functions that naturally handle variable length sequence data in multimedia domains. In particular for domains such as speech and images we explore the use of kernel functions that take full advantage of well known probabilistic models such as Gaussian Mixtures and single fullcovariance Gaussian models. We derive a kernel distance based on the Kullback-Leibler (KL) divergence between generative models. In effect our approach combines the best of both generative and discriminative methodsand replaces the standard SVM kernels. We perform experiments on speaker identification/verification and image classification tasksand show that these new kernels have the best performance in speaker verification and mostly outperform the Fisher kernel based SVM's and the generative classifiers in speaker identification and image classification.

artificial intelligence, kernel, machine learning, (19 more...)