Goto

Collaborating Authors

 Europe


Bayesian Image Super-Resolution

Neural Information Processing Systems

The extraction of a single high-quality image from a set of lowresolution images is an important problem which arises in fields such as remote sensing, surveillance, medical imaging and the extraction of still images from video. Typical approaches are based on the use of cross-correlation to register the images followed by the inversion of the transformation from the unknown high resolution image to the observed low resolution images, using regularization to resolve the ill-posed nature of the inversion process. In this paper we develop a Bayesian treatment of the super-resolution problem in which the likelihood function for the image registration parameters is based on a marginalization over the unknown high-resolution image. This approach allows us to estimate the unknown point spread function, and is rendered tractable through the introduction of a Gaussian process prior over images. Results indicate a significant improvement over techniques based on MAP (maximum a-posteriori) point optimization of the high resolution image and associated registration parameters. 1 Introduction The task in super-resolution is to combine a set of low resolution images of the same scene in order to obtain a single image of higher resolution. Provided the individual low resolution images have sub-pixel displacements relative to each other, it is possible to extract high frequency details of the scene well beyond the Nyquist limit of the individual source images.


Monaural Speech Separation

Neural Information Processing Systems

Monaural speech separation has been studied in previous systems that incorporate auditory scene analysis principles. A major problem for these systems is their inability to deal with speech in the highfrequency range. Psychoacoustic evidence suggests that different perceptual mechanisms are involved in handling resolved and unresolved harmonics. Motivated by this, we propose a model for monaural separation that deals with low-frequency and highfrequency signals differently. For resolved harmonics, our model generates segments based on temporal continuity and cross-channel correlation, and groups them according to periodicity. For unresolved harmonics, the model generates segments based on amplitude modulation (AM) in addition to temporal continuity and groups them according to AM repetition rates derived from sinusoidal modeling. Underlying the separation process is a pitch contour obtained according to psychoacoustic constraints. Our model is systematically evaluated, and it yields substantially better performance than previous systems, especially in the high-frequency range.


An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition

Neural Information Processing Systems

They are very well suited to handle discrete of continuous sequences of varying sizes. Moreover, an efficient training algorithm (EM) is available, as well as an efficient decoding algorithm (Viterbi), which provides the optimal sequence of states (and the corresponding sequence of high level events) associated with a given sequence of low-level data. On the other hand, multimodal information processing is currently a very challenging framework of applications including multimodal person authentication, multimodal speech recognition, multimodal event analyzers, etc. In that framework, the same sequence of events is represented not only by a single sequence of data but by a series of sequences of data, each of them coming eventually from a different modality: video streams with various viewpoints, audio stream(s), etc. One such task, which will be presented in this paper, is multimodal speech recognition using both a microphone and a camera recording a speaker simultaneously while he (she) speaks.


Bayesian Estimation of Time-Frequency Coefficients for Audio Signal Enhancement

Neural Information Processing Systems

The Bayesian paradigm provides a natural and effective means of exploiting prior knowledge concerning the time-frequency structure of sound signals such as speech and music--something which has often been overlooked in traditional audio signal processing approaches. Here, after constructing a Bayesian model and prior distributions capable of taking into account the time-frequency characteristics of typical audio waveforms, we apply Markov chain Monte Carlo methods in order to sample from the resultant posterior distribution of interest. We present speech enhancement results which compare favourably in objective terms with standard time-varying filtering techniques (and in several cases yield superior performance, both objectively and subjectively); moreover, in contrast to such methods, our results are obtained without an assumption of prior knowledge of the noise power.


A Probabilistic Approach to Single Channel Blind Signal Separation

Neural Information Processing Systems

We present a new technique for achieving source separation when given only a single channel recording. The main idea is based on exploiting the inherent time structure of sound sources by learning a priori sets of basis filters in time domain that encode the sources in a statistically efficient manner. We derive a learning algorithm using a maximum likelihood approach given the observed single channel data and sets of basis filters.


Spike Timing-Dependent Plasticity in the Address Domain

Neural Information Processing Systems

Address-event representation (AER), originally proposed as a means to communicate sparse neural events between neuromorphic chips, has proven efficient in implementing large-scale networks with arbitrary, configurable synaptic connectivity. In this work, we further extend the functionality of AER to implement arbitrary, configurable synaptic plasticity in the address domain. As proof of concept, we implement a biologically inspired form of spike timing-dependent plasticity (STDP) based on relative timing of events in an AER framework. Experimental results from an analog VLSI integrate-and-fire network demonstrate address domain learning in a task that requires neurons to group correlated inputs.


Developing Topography and Ocular Dominance Using Two aVLSI Vision Sensors and a Neurotrophic Model of Plasticity

Neural Information Processing Systems

A neurotrophic model for the co-development of topography and ocular dominance columns in the primary visual cortex has recently been proposed. In the present work, we test this model by driving it with the output of a pair of neuronal vision sensors stimulated by disparate moving patterns. We show that the temporal correlations in the spike trains generated by the two sensors elicit the development of refined topography and ocular dominance columns, even in the presence of significant amounts of spontaneous activity and fixed-pattern noise in the sensors.


Classifying Patterns of Visual Motion - a Neuromorphic Approach

Neural Information Processing Systems

We report a system that classifies and can learn to classify patterns of visual motion online. The complete system is described by the dynamics of its physical network architectures. The combination of the following properties makes the system novel: Firstly, the front-end of the system consists of an aVLSI optical flow chip that collectively computes 2-D global visual motion in real-time [1]. Secondly, the complexity of the classification task is significantly reduced by mapping the continuous motion trajectories to sequences of'motion events'. And thirdly, all the network structures are simple and with the exception of the optical flow chip based on a Winner-Take-All (WTA) architecture. We demonstrate the application of the proposed generic system for a contactless man-machine interface that allows to write letters by visual motion. Regarding the low complexity of the system, its robustness and the already existing front-end, a complete aVLSI system-on-chip implementation is realistic, allowing various applications in mobile electronic devices.


Combining Features for BCI

Neural Information Processing Systems

Recently, interest is growing to develop an effective communication interface connecting the human brain to a computer, the'Brain-Computer Interface' (BCI). One motivation of BCI research is to provide a new communication channel substituting normal motor output in patients with severe neuromuscular disabilities. In the last decade, various neurophysiological cortical processes, such as slow potential shifts, movement related potentials (MRPs) or event-related desynchronization (ERD) of spontaneous EEG rhythms, were shown to be suitable for BCI, and, consequently, different independent approaches of extracting BCI-relevant EEGfeatures for single-trial analysis are under investigation. Here, we present and systematically compare several concepts for combining such EEGfeatures to improve the single-trial classification. Feature combinations are evaluated on movement imagination experiments with 3 subjects where EEGfeatures are based on either MRPs or ERD, or both. Those combination methods that incorporate the assumption that the single EEGfeatures are physiologically mutually independent outperform the plain method of'adding' evidence where the single-feature vectors are simply concatenated. These results strengthen the hypothesis that MRP and ERD reflect at least partially independent aspects of cortical processes and open a new perspective to boost BCI effectiveness.


Improving Transfer Rates in Brain Computer Interfacing: A Case Study

Neural Information Processing Systems

We adopted an approach of Farwell & Donchin [4], which we tried to improve in several aspects. The main objective was to improve the transfer rates based on offline analysis of EEGdata but within a more realistic setup closer to an online realization than in the original studies. The objective was achieved along two different tracks: on the one hand we used state-of-the-art machine learning techniques for signal classification and on the other hand we augmented the data space by using more electrodes for the interface. For the classification task we utilized SVMs and, as motivated by recent findings on the learning of discriminative densities, we accumulated the values of the classification function in order to combine several classifications, which finally lead to significantly improved rates as compared with techniques applied in the original work. In combination with the data space augmentation, we achieved competitive transfer rates at an average of 50.5 bits/min and with a maximum of 84.7 bits/min.