Goto

Collaborating Authors

 Country


Speech Recognition using SVMs

Neural Information Processing Systems

An important issue in applying SVMs to speech recognition is the ability to classify variable length sequences. This paper presents extensions to a standard scheme for handling this variable length data, the Fisher score. A more useful mapping is introduced based on the likelihood-ratio. The score-space defined by this mapping avoids some limitations of the Fisher score. Class-conditional generative models are directly incorporated into the definition of the score-space.


Speech Recognition with Missing Data using Recurrent Neural Nets

Neural Information Processing Systems

In the'missing data' approach to improving the robustness of automatic speech recognition to added noise, an initial process identifies spectraltemporal regions which are dominated by the speech source. The remaining regions are considered to be'missing'. In this paper we develop a connectionist approach to the problem of adapting speech recognition to the missing data case, using Recurrent Neural Networks. In contrast to methods based on Hidden Markov Models, RNNs allow us to make use of long-term time constraints and to make the problems of classification with incomplete data and imputing missing values interact. We report encouraging results on an isolated digit recognition task.


Estimating the Reliability of ICA Projections

Neural Information Processing Systems

When applying unsupervised learning techniques like ICA or temporal decorrelation, a key question is whether the discovered projections are reliable. In other words: can we give error bars or can we assess the quality of our separation? We use resampling methods to tackle these questions and show experimentally that our proposed variance estimations are strongly correlated to the separation error. We demonstrate that this reliability estimation can be used to choose the appropriate ICA-model, to enhance significantly the separation performance, and, most important, to mark the components that have a actual physical meaning.


Audio-Visual Sound Separation Via Hidden Markov Models

Neural Information Processing Systems

It is well known that under noisy conditions we can hear speech much more clearly when we read the speaker's lips. This suggests the utility of audiovisual information for the task of speech enhancement. We propose a method to exploit audiovisual cues to enable speech separation under non-stationary noise and with a single microphone. We revise and extend HMM-based speech enhancement techniques, in which signal and noise models are factori ally combined, to incorporate visual lip information and employ novel signal HMMs in which the dynamics of narrow-band and wide band components are factorial. We avoid the combinatorial explosion in the factorial model by using a simple approximate inference technique to quickly estimate the clean signals in a mixture. We present a preliminary evaluation of this approach using a small-vocabulary audiovisual database, showing promising improvements in machine intelligibility for speech enhanced using audio and visual information.


ALGONQUIN - Learning Dynamic Noise Models From Noisy Speech for Robust Speech Recognition

Neural Information Processing Systems

A challenging, unsolved problem in the speech recognition community is recognizing speech signals that are corrupted by loud, highly nonstationary noise. One approach to noisy speech recognition is to automatically remove the noise from the cepstrum sequence before feeding it in to a clean speech recognizer. In previous work published in Eurospeech, we showed how a probability model trained on clean speech and a separate probability model trained on noise could be combined for the purpose of estimating the noisefree speech from the noisy speech. We showed how an iterative 2nd order vector Taylor series approximation could be used for probabilistic inference in this model. In many circumstances, it is not possible to obtain examples of noise without speech.


A Sequence Kernel and its Application to Speaker Recognition

Neural Information Processing Systems

A novel approach for comparing sequences of observations using an explicit-expansion kernel is demonstrated. The kernel is derived using the assumption of the independence of the sequence of observations and a mean-squared error training criterion. The use of an explicit expansion kernel reduces classifier model size and computation dramatically, resulting in model sizes and computation one-hundred times smaller in our application. The explicit expansion also preserves the computational advantages of an earlier architecture based on mean-squared error training. Training using standard support vector machine methodology gives accuracy that significantly exceeds the performance of state-of-the-art mean-squared error training for a speaker recognition task.


Relative Density Nets: A New Way to Combine Backpropagation with HMM's

Neural Information Processing Systems

Logistic units in the first hidden layer of a feedforward neural network compute the relative probability of a data point under two Gaussians. This leads us to consider substituting other density models. We present an architecture for performing discriminative learning of Hidden Markov Models using a network of many small HMM's. Experiments on speech data show it to be superior to the standard method of discriminatively training HMM's.


Intransitive Likelihood-Ratio Classifiers

Neural Information Processing Systems

In this work, we introduce an information-theoretic based correction term to the likelihood ratio classification method for multiple classes. Under certain conditions, the term is sufficient for optimally correcting the difference between the true and estimated likelihood ratio, and we analyze this in the Gaussian case. We find that the new correction term significantly improves the classification results when tested on medium vocabulary speech recognition tasks. Moreover, the addition of this term makes the class comparisons analogous to an intransitive game and we therefore use several tournament-like strategies to deal with this issue. We find that further small improvements are obtained by using an appropriate tournament. Lastly, we find that intransitivity appears to be a good measure of classification confidence.


Analog Soft-Pattern-Matching Classifier using Floating-Gate MOS Technology

Neural Information Processing Systems

A flexible pattern-matching analog classifier is presented in conjunction with a robust image representation algorithm called Principal Axes Projection (PAP). In the circuit, the functional form of matching is configurable in terms of the peak position, the peak height and the sharpness of the similarity evaluation. The test chip was fabricated in a 0.6-µm CMOS technology and successfully applied to handwritten pattern recognition and medical radiograph analysis using PAP as a feature extraction pre-processing step for robust image coding. The separation and classification of overlapping patterns is also experimentally demonstrated.


Learning Spike-Based Correlations and Conditional Probabilities in Silicon

Neural Information Processing Systems

We have designed and fabricated a VLSI synapse that can learn a conditional probability or correlation between spike-based inputs and feedback signals. The synapse is low power, compact, provides nonvolatile weight storage, and can perform simultaneous multiplication and adaptation. We can calibrate arrays of synapses to ensure uniform adaptation characteristics. Finally, adaptation in our synapse does not necessarily depend on the signals used for computation. Consequently, our synapse can implement learning rules that correlate past and present synaptic activity. We provide analysis and experimental chip results demonstrating the operation in learning and calibration mode, and show how to use our synapse to implement various learning rules in silicon.