Goto

Collaborating Authors

 Country


Eye Micro-movements Improve Stimulus Detection Beyond the Nyquist Limit in the Peripheral Retina

Neural Information Processing Systems

Even under perfect fixation the human eye is under steady motion (tremor, microsaccades, slow drift). The "dynamic" theory of vision [1, 2] states that eye-movements can improve hyperacuity. According to this theory, eye movements are thought to create variable spatial excitation patterns on the photoreceptor grid, which will allow for better spatiotemporal summation at later stages.


Eye Movements for Reward Maximization

Neural Information Processing Systems

Recent eye tracking studies in natural tasks suggest that there is a tight link between eye movements and goal directed motor actions. However, most existing models of human eye movements provide a bottom up account that relates visual attention to attributes of the visual scene. The purpose of this paper is to introduce a new model of human eye movements that directly ties eye movements to the ongoing demands of behavior. The basic idea is that eye movements serve to reduce uncertainty about environmental variables that are task relevant. A value is assigned to an eye movement by estimating the expected cost of the uncertainty that will result if the movement is not made. If there are several candidate eye movements, the one with the highest expected value is chosen. The model is illustrated using a humanoid graphic figure that navigates on a sidewalk in a virtual urban environment. Simulations show our protocol is superior to a simple round robin scheduling mechanism.


Human and Ideal Observers for Detecting Image Curves

Neural Information Processing Systems

This paper compares the ability of human observers to detect target image curves with that of an ideal observer. The target curves are sampled from a generative model which specifies (probabilistically) the geometry and local intensity properties of the curve. The ideal observer performs Bayesian inference on the generative model using MAP estimation. Varying the probability model for the curve geometry enables us investigate whether human performance is best for target curves that obey specific shape statistics, in particular those observed on natural shapes. Experiments are performed with data on both rectangular and hexagonal lattices. Our results show that human observers' performance approaches that of the ideal observer and are, in general, closest to the ideal for conditions where the target curve tends to be straight or similar to natural statistics on curves. This suggests a bias of human observers towards straight curves and natural statistics.


A Functional Architecture for Motion Pattern Processing in MSTd

Neural Information Processing Systems

Psychophysical studies suggest the existence of specialized detectors for component motion patterns (radial, circular, and spiral), that are consistent with the visual motion properties of cells in the dorsal medial superior temporal area (MSTd) of nonhuman primates. Here we use a biologically constrained model of visual motion processing in MSTd, in conjunction with psychophysical performance on two motion pattern tasks, to elucidate the computational mechanisms associated with the processing of widefield motion patterns encountered during self-motion. In both tasks discrimination thresholds varied significantly with the type of motion pattern presented, suggesting perceptual correlates to the preferred motion bias reported in MSTd. Through the model we demonstrate that while independently responding motion pattern units are capable of encoding information relevant to the visual motion tasks, equivalent psychophysical performance can only be achieved using interconnected neural populations that systematically inhibit non-responsive units. These results suggest the cyclic trends in psychophysical performance may be mediated, in part, by recurrent connections within motion pattern responsive areas whose structure is a function of the similarity in preferred motion patterns and receptive field locations between units.


Nonlinear Processing in LGN Neurons

Neural Information Processing Systems

According to a widely held view, neurons in lateral geniculate nucleus (LGN) operate on visual stimuli in a linear fashion. There is ample evidence, however, that LGN responses are not entirely linear. To account for nonlinearities we propose a model that synthesizes more than 30 years of research in the field. Model neurons have a linear receptive field, and a nonlinear, divisive suppressive field. The suppressive field computes local root-meansquare contrast. To test this model we recorded responses from LGN of anesthetized paralyzed cats. We estimate model parameters from a basic set of measurements and show that the model can accurately predict responses to novel stimuli. The model might serve as the new standard model of LGN responses. It specifies how visual processing in LGN involves both linear filtering and divisive gain control.


Local Phase Coherence and the Perception of Blur

Neural Information Processing Systems

Blur is one of the most common forms of image distortion. It can arise from a variety of sources, such as atmospheric scatter, lens defocus, optical aberrations of the lens, and spatial and temporal sensor integration. Human observers are bothered by blur, and our visual systems are quite good at reporting whether an image appears blurred (or sharpened) [1, 2]. However, the mechanism by which this is accomplished is not well understood. Clearly, detection of blur requires some model of what constitutes an unblurred image. In recent years, there has been a surge of interest in the modelling of natural images, both for purposes of improving the performance of image processing and computer vision systems, and also for furthering our understanding of biological visual systems.


A Classification-based Cocktail-party Processor

Neural Information Processing Systems

At a cocktail party, a listener can selectively attend to a single voice and filter out other acoustical interferences. How to simulate this perceptual ability remains a great challenge. This paper describes a novel supervised learning approach to speech segregation, in which a target speech signal is separated from interfering sounds using spatial location cues: interaural time differences (ITD) and interaural intensity differences (IID). Motivated by the auditory masking effect, we employ the notion of an ideal time-frequency binary mask, which selects the target if it is stronger than the interference in a local time-frequency unit. Within a narrow frequency band, modifications to the relative strength of the target source with respect to the interference trigger systematic changes for estimated ITD and IID.


One Microphone Blind Dereverberation Based on Quasi-periodicity of Speech Signals

Neural Information Processing Systems

Speech dereverberation is desirable with a view to achieving, for example, robust speech recognition in the real world. However, it is still a challenging problem, especially when using a single microphone. Although blind equalization techniques have been exploited, they cannot deal with speech signals appropriately because their assumptions are not satisfied by speech signals. We propose a new dereverberation principle based on an inherent property of speech signals, namely quasi-periodicity. The present methods learn the dereverberation filter from a lot of speech data with no prior knowledge of the data, and can achieve high quality speech dereverberation especially when the reverberation time is long.


Eigenvoice Speaker Adaptation via Composite Kernel Principal Component Analysis

Neural Information Processing Systems

Eigenvoice speaker adaptation has been shown to be effective when only a small amount of adaptation data is available. At the heart of the method is principal component analysis (PCA) employed to find the most important eigenvoices. In this paper, we postulate that nonlinear PCA, in particular kernel PCA, may be even more effective. One major challenge is to map the feature-space eigenvoices back to the observation space so that the state observation likelihoods can be computed during the estimation of eigenvoice weights and subsequent decoding. Our solution is to compute kernel PCA using composite kernels, and we will call our new method kernel eigenvoice speaker adaptation. On the TIDIGITS corpus, we found that compared with a speaker-independent model, our kernel eigenvoice adaptation method can reduce the word error rate by 28-33% while the standard eigenvoice approach can only match the performance of the speaker-independent model.


Probabilistic Inference of Speech Signals from Phaseless Spectrograms

Neural Information Processing Systems

Many techniques for complex speech processing such as denoising and deconvolution, time/frequency warping, multiple speaker separation, and multiple microphone analysis operate on sequences of short-time power spectra (spectrograms), a representation which is often well-suited to these tasks. However, a significant problem with algorithms that manipulate spectrograms is that the output spectrogram does not include a phase component, which is needed to create a time-domain signal that has good perceptual quality. Here we describe a generative model of time-domain speech signals and their spectrograms, and show how an efficient optimizer can be used to find the maximum a posteriori speech signal, given the spectrogram.