Country
Speech Recognition using SVMs
An important issue in applying SVMs to speech recognition is the ability to classify variable length sequences. This paper presents extensions to a standard scheme for handling this variable length data, the Fisher score. A more useful mapping is introduced based on the likelihood-ratio. The score-space defined by this mapping avoids some limitations of the Fisher score. Class-conditional generative modelsare directly incorporated into the definition of the score-space.
Speech Recognition with Missing Data using Recurrent Neural Nets
In the'missing data' approach to improving the robustness of automatic speech recognition to added noise, an initial process identifies spectraltemporal regionswhich are dominated by the speech source. The remaining regions are considered to be'missing'. In this paper we develop a connectionist approach to the problem of adapting speech recognition to the missing data case, using Recurrent Neural Networks. In contrast to methods based on Hidden Markov Models, RNNs allow us to make use of long-term time constraints and to make the problems of classification with incomplete data and imputing missing values interact. We report encouraging results on an isolated digit recognition task.
Estimating the Reliability of ICA Projections
Meinecke, Frank C., Ziehe, Andreas, Kawanabe, Motoaki, Müller, Klaus-Robert
When applying unsupervised learning techniques like ICA or temporal decorrelation,a key question is whether the discovered projections arereliable. In other words: can we give error bars or can we assess the quality of our separation? We use resampling methods totackle these questions and show experimentally that our proposed variance estimations are strongly correlated to the separation error.We demonstrate that this reliability estimation can be used to choose the appropriate ICA-model, to enhance significantly theseparation performance, and, most important, to mark the components that have a actual physical meaning.
Audio-Visual Sound Separation Via Hidden Markov Models
Hershey, John R., Casey, Michael
It is well known that under noisy conditions we can hear speech much more clearly when we read the speaker's lips. This suggests theutility of audiovisual information for the task of speech enhancement. We propose a method to exploit audiovisual cues to enable speech separation under non-stationary noise and with a single microphone. We revise and extend HMM-based speech enhancement techniques, in which signal and noise models are factori allycombined, to incorporate visual lip information and employ novelsignal HMMs in which the dynamics of narrow-band and wide band components are factorial. We avoid the combinatorial explosionin the factorial model by using a simple approximate inference technique to quickly estimate the clean signals in a mixture. We present a preliminary evaluation of this approach using a small-vocabulary audiovisual database, showing promising improvements in machine intelligibility for speech enhanced using audio and visual information.
ALGONQUIN - Learning Dynamic Noise Models From Noisy Speech for Robust Speech Recognition
Frey, Brendan J., Kristjansson, Trausti T., Deng, Li, Acero, Alex
A challenging, unsolved problem in the speech recognition community isrecognizing speech signals that are corrupted by loud, highly nonstationary noise. One approach to noisy speech recognition isto automatically remove the noise from the cepstrum sequence beforefeeding it in to a clean speech recognizer. In previous work published in Eurospeech, we showed how a probability model trained on clean speech and a separate probability model trained on noise could be combined for the purpose of estimating the noisefree speechfrom the noisy speech. We showed how an iterative 2nd order vector Taylor series approximation could be used for probabilistic inferencein this model. In many circumstances, it is not possible to obtain examples of noise without speech.
Analog Soft-Pattern-Matching Classifier using Floating-Gate MOS Technology
Yamasaki, Toshihiko, Shibata, Tadashi
A flexible pattern-matching analog classifier is presented in conjunction witha robust image representation algorithm called Principal Axes Projection (PAP). In the circuit, the functional form of matching is configurable in terms of the peak position, the peak height and the sharpness of the similarity evaluation. The test chip was fabricated ina 0.6-µm CMOS technology and successfully applied to handwritten pattern recognition and medical radiograph analysis using PAP as a feature extraction pre-processing step for robust image coding. The separation and classification of overlapping patterns is also experimentally demonstrated.
Learning Spike-Based Correlations and Conditional Probabilities in Silicon
Shon, Aaron P., Hsu, David, Diorio, Chris
We have designed and fabricated a VLSI synapse that can learn a conditional probability or correlation between spike-based inputs and feedback signals. The synapse is low power, compact, provides nonvolatile weight storage, and can perform simultaneous multiplication andadaptation. We can calibrate arrays of synapses to ensure uniform adaptation characteristics. Finally, adaptation in our synapse does not necessarily depend on the signals used for computation. Consequently,our synapse can implement learning rules that correlate past and present synaptic activity. We provide analysis andexperimental chip results demonstrating the operation in learning and calibration mode, and show how to use our synapse to implement various learning rules in silicon.
An Efficient Clustering Algorithm Using Stochastic Association Model and Its Implementation Using Nanostructures
Morie, Takashi, Matsuura, Tomohiro, Nagata, Makoto, Iwata, Atsushi
This paper describes a clustering algorithm for vector quantizers using a "stochastic association model". It offers a new simple and powerful softmax adaptationrule. The adaptation process is the same as the online K-means clustering method except for adding random fluctuation in the distortion error evaluation process. Simulation results demonstrate that the new algorithm can achieve efficient adaptation as high as the "neural gas" algorithm, which is reported as one of the most efficient clustering methods. It is a key to add uncorrelated random fluctuation in the similarity evaluationprocess for each reference vector. For hardware implementation ofthis process, we propose a nanostructure, whose operation is described by a single-electron circuit. It positively uses fluctuation in quantum mechanical tunneling processes.
EM-DD: An Improved Multiple-Instance Learning Technique
We present a new multiple-instance (MI) learning technique (EM DD) that combines EM with the diverse density (DD) algorithm. EM-DD is a general-purpose MI algorithm that can be applied with boolean or real-value labels and makes real-value predictions. On the boolean Musk benchmarks, the EM-DD algorithm without any tuning significantly outperforms all previous algorithms. EM-DD is relatively insensitive to the number of relevant attributes in the data set and scales up well to large bag sizes. Furthermore, EM DD provides a new framework for MI learning, in which the MI problem is converted to a single-instance setting by using EM to estimate the instance responsible for the label of the bag. 1 Introduction The multiple-instance (MI) learning model has received much attention.