Goto

Collaborating Authors

 Technology


Audio-Visual Sound Separation Via Hidden Markov Models

Neural Information Processing Systems

It is well known that under noisy conditions we can hear speech much more clearly when we read the speaker's lips. This suggests theutility of audiovisual information for the task of speech enhancement. We propose a method to exploit audiovisual cues to enable speech separation under non-stationary noise and with a single microphone. We revise and extend HMM-based speech enhancement techniques, in which signal and noise models are factori allycombined, to incorporate visual lip information and employ novelsignal HMMs in which the dynamics of narrow-band and wide band components are factorial. We avoid the combinatorial explosionin the factorial model by using a simple approximate inference technique to quickly estimate the clean signals in a mixture. We present a preliminary evaluation of this approach using a small-vocabulary audiovisual database, showing promising improvements in machine intelligibility for speech enhanced using audio and visual information.


ALGONQUIN - Learning Dynamic Noise Models From Noisy Speech for Robust Speech Recognition

Neural Information Processing Systems

A challenging, unsolved problem in the speech recognition community isrecognizing speech signals that are corrupted by loud, highly nonstationary noise. One approach to noisy speech recognition isto automatically remove the noise from the cepstrum sequence beforefeeding it in to a clean speech recognizer. In previous work published in Eurospeech, we showed how a probability model trained on clean speech and a separate probability model trained on noise could be combined for the purpose of estimating the noisefree speechfrom the noisy speech. We showed how an iterative 2nd order vector Taylor series approximation could be used for probabilistic inferencein this model. In many circumstances, it is not possible to obtain examples of noise without speech.


A Sequence Kernel and its Application to Speaker Recognition

Neural Information Processing Systems

A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion.


Analog Soft-Pattern-Matching Classifier using Floating-Gate MOS Technology

Neural Information Processing Systems

A flexible pattern-matching analog classifier is presented in conjunction witha robust image representation algorithm called Principal Axes Projection (PAP). In the circuit, the functional form of matching is configurable in terms of the peak position, the peak height and the sharpness of the similarity evaluation. The test chip was fabricated ina 0.6-ยตm CMOS technology and successfully applied to handwritten pattern recognition and medical radiograph analysis using PAP as a feature extraction pre-processing step for robust image coding. The separation and classification of overlapping patterns is also experimentally demonstrated.


Learning Spike-Based Correlations and Conditional Probabilities in Silicon

Neural Information Processing Systems

We have designed and fabricated a VLSI synapse that can learn a conditional probability or correlation between spike-based inputs and feedback signals. The synapse is low power, compact, provides nonvolatile weight storage, and can perform simultaneous multiplication andadaptation. We can calibrate arrays of synapses to ensure uniform adaptation characteristics. Finally, adaptation in our synapse does not necessarily depend on the signals used for computation. Consequently,our synapse can implement learning rules that correlate past and present synaptic activity. We provide analysis andexperimental chip results demonstrating the operation in learning and calibration mode, and show how to use our synapse to implement various learning rules in silicon.


An Efficient Clustering Algorithm Using Stochastic Association Model and Its Implementation Using Nanostructures

Neural Information Processing Systems

This paper describes a clustering algorithm for vector quantizers using a "stochastic association model". It offers a new simple and powerful softmax adaptationrule. The adaptation process is the same as the online K-means clustering method except for adding random fluctuation in the distortion error evaluation process. Simulation results demonstrate that the new algorithm can achieve efficient adaptation as high as the "neural gas" algorithm, which is reported as one of the most efficient clustering methods. It is a key to add uncorrelated random fluctuation in the similarity evaluationprocess for each reference vector. For hardware implementation ofthis process, we propose a nanostructure, whose operation is described by a single-electron circuit. It positively uses fluctuation in quantum mechanical tunneling processes.


Citcuits for VLSI Implementation of Temporally Asymmetric Hebbian Learning

Neural Information Processing Systems

Experimental data has shown that synaptic strength modification in some types of biological neurons depends upon precise spike timing differencesbetween presynaptic and postsynaptic spikes. Several temporally-asymmetricHebbian learning rules motivated by this data have been proposed. We argue that such learning rules are suitable to analog VLSI implementation. We describe an easily tunablecircuit to modify the weight of a silicon spiking neuron according to those learning rules. Test results from the fabrication of the circuit using a O.6J.lm CMOS process are given.


EM-DD: An Improved Multiple-Instance Learning Technique

Neural Information Processing Systems

We present a new multiple-instance (MI) learning technique (EM DD) that combines EM with the diverse density (DD) algorithm. EM-DD is a general-purpose MI algorithm that can be applied with boolean or real-value labels and makes real-value predictions. On the boolean Musk benchmarks, the EM-DD algorithm without any tuning significantly outperforms all previous algorithms. EM-DD is relatively insensitive to the number of relevant attributes in the data set and scales up well to large bag sizes. Furthermore, EM DD provides a new framework for MI learning, in which the MI problem is converted to a single-instance setting by using EM to estimate the instance responsible for the label of the bag. 1 Introduction The multiple-instance (MI) learning model has received much attention.


A General Greedy Approximation Algorithm with Applications

Neural Information Processing Systems

Greedy approximation algorithms have been frequently used to obtain sparse solutions to learning problems. In this paper, we present a general greedy algorithm for solving a class of convex optimization problems. We derive a bound on the rate of approximation for this algorithm, and show that our algorithm includes a number of earlier studies as special cases.


Blind Source Separation via Multinode Sparse Representation

Neural Information Processing Systems

We consider a problem of blind source separation from a set of instantaneous linearmixtures, where the mixing matrix is unknown. It was discovered recently, that exploiting the sparsity of sources in an appropriate representationaccording to some signal dictionary, dramatically improves the quality of separation. In this work we use the property of multi scale transforms, such as wavelet or wavelet packets, to decompose signals into sets of local features with various degrees of sparsity. We use this intrinsic property for selecting the best (most sparse) subsets of features for further separation. The performance of the algorithm is verified onnoise-free and noisy data. Experiments with simulated signals, musical sounds and images demonstrate significant improvement of separation qualityover previously reported results. 1 Introduction