Goto

Collaborating Authors

 Country


Analysis of Information in Speech Based on MANOVA

Neural Information Processing Systems

We propose analysis of information in speech using three sources - language (phone), speaker and channeL Information in speech is measured as mutual information between the source and the set of features extracted from speech signaL We assume that distribution offeatures can be modeled using Gaussian distribution. The mutual information is computed using the results of analysis of variability in speech. We observe similarity in the results of phone variability and phone information, and show that the results of the proposed analysis have more meaningful interpretations than the analysis of variability. 1 Introduction Speech signal carries information about the linguistic message, the speaker, the communication channeL In the previous work [1, 2], we proposed analysis of information inspeech as analysis of variability in a set of features extracted from the speech signal. The variability was measured as covariance of the features, and analysis was performed using using multivariate analysis of variance (MANOVA). Total variability was divided into three types of variabilities, namely, intra-phone (or phone) variability, speaker variability, and channel variability.


Real Time Voice Processing with Audiovisual Feedback: Toward Autonomous Agents with Perfect Pitch

Neural Information Processing Systems

We have implemented a real time front end for detecting voiced speech and estimating its fundamental frequency. The front end performs the signal processing for voice-driven agents that attend to the pitch contours of human speech and provide continuous audiovisual feedback. The algorithm weuse for pitch tracking has several distinguishing features: it makes no use of FFTs or autocorrelation at the pitch period; it updates the pitch incrementally on a sample-by-sample basis; it avoids peak picking and does not require interpolation in time or frequency to obtain high resolution estimates;and it works reliably over a four octave range, in real time, without the need for postprocessing to produce smooth contours. The algorithm is based on two simple ideas in neural computation: the introduction of a purposeful nonlinearity, and the error signal of a least squares fit. The pitch tracker is used in two real time multimedia applications: avoice-to-MIDI player that synthesizes electronic music from vocalized melodies,and an audiovisual Karaoke machine with multimodal feedback. Both applications run on a laptop and display the user's pitch scrolling across the screen as he or she sings into the computer.


A Probabilistic Approach to Single Channel Blind Signal Separation

Neural Information Processing Systems

We present a new technique for achieving source separation when given only a single channel recording. The main idea is based on exploiting the inherent time structure of sound sources by learning a priori sets of basis filters in time domain that encode the sources in a statistically efficient manner. We derive a learning algorithm using a maximum likelihood approach given the observed single channel data and sets of basis filters.


Forward-Decoding Kernel-Based Phone Recognition

Neural Information Processing Systems

Forward decoding kernel machines (FDKM) combine large-margin classifiers withhidden Markov models (HMM) for maximum a posteriori (MAP) adaptive sequence estimation. State transitions in the sequence are conditioned on observed data using a kernel-based probability model trained with a recursive scheme that deals effectively with noisy and partially labeleddata. Training over very large datasets is accomplished using a sparse probabilistic support vector machine (SVM) model based on quadratic entropy, and an online stochastic steepest descent algorithm. For speaker-independent continuous phone recognition, FDKM trained over 177,080 samples of the TlMIT database achieves 80.6% recognition accuracy over the full test set, without use of a prior phonetic language model. 1 Introduction Sequence estimation is at the core of many problems in pattern recognition, most notably speech and language processing. Recognizing dynamic patterns in sequential data requires a set of tools very different from classifiers trained to recognize static patterns in data assumed i.i.d.


Field-Programmable Learning Arrays

Neural Information Processing Systems

This paper introduces the Field-Programmable Learning Array, a new paradigm for rapid prototyping of learning primitives and machinelearning algorithmsin silicon. The FPLA is a mixed-signal counterpart to the all-digital Field-Programmable Gate Array in that it enables rapid prototyping of algorithms in hardware. Unlike the FPGA, the FPLA is targeted directly for machine learning by providing local, parallel, online analoglearning using floating-gate MOS synapse transistors. We present a prototype FPLA chip comprising an array of reconfigurable computational blocks and local interconnect. We demonstrate the viability ofthis architecture by mapping several learning circuits onto the prototype chip.


Spike Timing-Dependent Plasticity in the Address Domain

Neural Information Processing Systems

Address-event representation (AER), originally proposed as a means to communicate sparse neural events between neuromorphic chips, has proven efficient in implementing large-scale networks with arbitrary, configurable synaptic connectivity. In this work, we further extend the functionality of AER to implement arbitrary, configurable synaptic plasticity inthe address domain. As proof of concept, we implement a biologically inspiredform of spike timing-dependent plasticity (STDP) based on relative timing of events in an AER framework. Experimental resultsfrom an analog VLSI integrate-and-fire network demonstrate address domain learning in a task that requires neurons to group correlated inputs.


Topographic Map Formation by Silicon Growth Cones

Neural Information Processing Systems

We describe a self-configuring neuromorphic chip that uses a model of activity-dependent axon remodeling to automatically wire topographic maps based solely on input correlations. Axons are guided by growth cones, which are modeled in analog VLSI for the first time. Growth cones migrate up neurotropin gradients, which are represented by charge diffusing in transistor channels. Virtual axons move by rerouting address-events. We refined an initially gross topographic projection by simulating retinal wave input. 1 Neuromorphic Systems Neuromorphic engineers are attempting to match the computational efficiency of biological systems by morphing neurocircuitry into silicon circuits [1].


Combining Features for BCI

Neural Information Processing Systems

Recently, interest is growing to develop an effective communication interface connectingthe human brain to a computer, the'Brain-Computer Interface' (BCI). One motivation of BCI research is to provide a new communication channel substituting normal motor output in patients with severe neuromuscular disabilities. In the last decade, various neurophysiological corticalprocesses, such as slow potential shifts, movement related potentials (MRPs) or event-related desynchronization (ERD) of spontaneous EEG rhythms, were shown to be suitable for BCI, and, consequently, differentindependent approaches of extracting BCI-relevant EEGfeatures for single-trial analysis are under investigation. Here, we present and systematically compare several concepts for combining such EEGfeatures to improve the single-trial classification. Feature combinations areevaluated on movement imagination experiments with 3 subjects where EEGfeatures are based on either MRPs or ERD, or both. Those combination methods that incorporate the assumption that the single EEG-featuresare physiologically mutually independent outperform the plain method of'adding' evidence where the single-feature vectors are simply concatenated. These results strengthen the hypothesis that MRP and ERD reflect at least partially independent aspects of cortical processes and open a new perspective to boost BCI effectiveness.


Improving Transfer Rates in Brain Computer Interfacing: A Case Study

Neural Information Processing Systems

We adopted an approach of Farwell & Donchin [4], which we tried to improve in several aspects. The main objective was to improve the transfer ratesbased on offline analysis of EEGdata but within a more realistic setup closer to an online realization than in the original studies. The objective wasachieved along two different tracks: on the one hand we used state-of-the-art machine learning techniques for signal classification and on the other hand we augmented the data space by using more electrodes for the interface. For the classification task we utilized SVMs and, as motivated byrecent findings on the learning of discriminative densities, we accumulated the values of the classification function in order to combine several classifications, which finally lead to significantly improved rates as compared with techniques applied in the original work. In combination withthe data space augmentation, we achieved competitive transfer rates at an average of 50.5 bits/min and with a maximum of 84.7 bits/min.