Goto

Collaborating Authors

 discrimination threshold



Learning a distance measure from the information-estimation geometry of data

arXiv.org Machine Learning

We introduce the Information-Estimation Metric (IEM), a novel form of distance function derived from an underlying continuous probability density over a domain of signals. The IEM is rooted in a fundamental relationship between information theory and estimation theory, which links the log-probability of a signal with the errors of an optimal denoiser, applied to noisy observations of the signal. In particular, the IEM between a pair of signals is obtained by comparing their denoising error vectors over a range of noise amplitudes. Geometrically, this amounts to comparing the score vector fields of the blurred density around the signals over a range of blur levels. We prove that the IEM is a valid global metric and derive a closed-form expression for its local second-order approximation, which yields a Riemannian metric. For Gaussian-distributed signals, the IEM coincides with the Mahalanobis distance. But for more complex distributions, it adapts, both locally and globally, to the geometry of the distribution. In practice, the IEM can be computed using a learned denoiser (analogous to generative diffusion models) and solving a one-dimensional integral. To demonstrate the value of our framework, we learn an IEM on the ImageNet database. Experiments show that this IEM is competitive with or outperforms state-of-the-art supervised image quality metrics in predicting human perceptual judgments.


1baff70e2669e8376347efd3a874a341-Reviews.html

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. COMMENTS BASED ON REVIEWER DISCUSSIONS AND AUTHOR REBUTTAL: I agree with the other reviewers that more could be done to constrain the specifics of the cue integration mechanism. However, I believe that if the data set is expanded, allowing the models to be better constrained, then the paper is appropriate and interesting for the NIPS community. I have left my quality score as it was, but I agree with the other reviewers that the paper merits a ``1'' rather than a ``2'' for impact score. ORIGINAL REVIEW: Summary: This paper extends an existing model for the perception of visual speed that uses a Bayesian observer model acting on the activity of independent spatiotemporal frequency channels. Previously, the model accounted for illusions of perceived speed by postulating the Bayes-optimal combination of noisy sensory representations with a prior for slow speeds.


Reviews: Eigen-Distortions of Hierarchical Representations

Neural Information Processing Systems

The submission presents a method to generate image distortions that are maximally/minimally discriminable in a certain image representation. The maximally/minimally distortion directions are defined as the eigenvectors of the Fisher Information Matrix with largest/smallest eigenvalue. Distortions are generated for image representations in the VGG-16 as well as for representations in models that were trained to predict human sensitivity to image distortions. Human discrimination thresholds for those distortions are measured. It is found that the difference in human discrimination threshold between max and min distortions of the model is largest for a biologically inspired'early vision' model that was trained to predict human sensitivity, compared to a CNN trained to predict human sensitivity or the VGG-16 representations. For the VGG representations it is found that the difference in detection threshold for humans is larger for min/max distortions of earlier layers than for later layers.


Eigen-Distortions of Hierarchical Representations

Neural Information Processing Systems

We develop a method for comparing hierarchical image representations in terms of their ability to explain perceptual sensitivity in humans. Specifically, we utilize Fisher information to establish a model-derived prediction of sensitivity to local perturbations of an image. For a given image, we compute the eigenvectors of the Fisher information matrix with largest and smallest eigenvalues, corresponding to the model-predicted most-and least-noticeable image distortions, respectively. For human subjects, we then measure the amount of each distortion that can be reliably detected when added to the image. We use this method to test the ability of a variety of representations to mimic human perceptual sensitivity.


Optimal integration of visual speed across different spatiotemporal frequency channels

Neural Information Processing Systems

How do humans perceive the speed of a coherent motion stimulus that contains motion energy in multiple spatiotemporal frequency bands? Here we tested the idea that perceived speed is the result of an integration process that optimally combines speed information across independent spatiotemporal frequency channels. We formalized this hypothesis with a Bayesian observer model that combines the likelihood functions provided by the individual channel responses (cues). We experimentally validated the model with a 2AFC speed discrimination experiment that measured subjects' perceived speed of drifting sinusoidal gratings with different contrasts and spatial frequencies, and of various combinations of these single gratings. We found that the perceived speeds of the combined stimuli are independent of the relative phase of the underlying grating components. The results also show that the discrimination thresholds are smaller for the combined stimuli than for the individual grating components, supporting the cue combination hypothesis. The proposed Bayesian model fits the data well, accounting for the full psychometric functions of both simple and combined stimuli. Fits are improved if we assume that the channel responses are subject to divisive normalization. Our results provide an important step toward a more complete model of visual motion perception that can predict perceived speeds for coherent motion stimuli of arbitrary spatial structure.


Tactile Weight Rendering: A Review for Researchers and Developers

arXiv.org Artificial Intelligence

Haptic rendering of weight plays an essential role in naturalistic object interaction in virtual environments. While kinesthetic devices have traditionally been used for this aim by applying forces on the limbs, tactile interfaces acting on the skin have recently offered potential solutions to enhance or substitute kinesthetic ones. Here, we aim to provide an in-depth overview and comparison of existing tactile weight rendering approaches. We categorized these approaches based on their type of stimulation into asymmetric vibration and skin stretch, further divided according to the working mechanism of the devices. Then, we compared these approaches using various criteria, including physical, mechanical, and perceptual characteristics of the reported devices and their potential applications. We found that asymmetric vibration devices have the smallest form factor, while skin stretch devices relying on the motion of flat surfaces, belts, or tactors present numerous mechanical and perceptual advantages for scenarios requiring more accurate weight rendering. Finally, we discussed the selection of the proposed categorization of devices and their application scopes, together with the limitations and opportunities for future research. We hope this study guides the development and use of tactile interfaces to achieve a more naturalistic object interaction and manipulation in virtual environments.


Direct information transfer rate optimisation for SSVEP-based BCI

arXiv.org Machine Learning

In this work, a classification method for SSVEP-based BCI is proposed. The classification method uses features extracted by traditional SSVEP-based BCI methods and finds optimal discrimination thresholds for each feature to classify the targets. Optimising the thresholds is formalised as a maximisation task of a performance measure of BCIs called information transfer rate (ITR). However, instead of the standard method of calculating ITR, which makes certain assumptions about the data, a more general formula is derived to avoid incorrect ITR calculation when the standard assumptions are not met. This allows the optimal discrimination thresholds to be automatically calculated and thus eliminates the need for manual parameter selection or performing computationally expensive grid searches. The proposed method shows good performance in classifying targets of a BCI, outperforming previously reported results on the same dataset by a factor of 2 in terms of ITR. The highest achieved ITR on the used dataset was 62 bit/min. The proposed method also provides a way to reduce false classifications, which is important in real-world applications.


Eigen-Distortions of Hierarchical Representations

Neural Information Processing Systems

We develop a method for comparing hierarchical image representations in terms of their ability to explain perceptual sensitivity in humans. Specifically, we utilize Fisher information to establish a model-derived prediction of sensitivity to local perturbations of an image. For a given image, we compute the eigenvectors of the Fisher information matrix with largest and smallest eigenvalues, corresponding to the model-predicted most- and least-noticeable image distortions, respectively. For human subjects, we then measure the amount of each distortion that can be reliably detected when added to the image. We use this method to test the ability of a variety of representations to mimic human perceptual sensitivity. We find that the early layers of VGG16, a deep neural network optimized for object recognition, provide a better match to human perception than later layers, and a better match than a 4-stage convolutional neural network (CNN) trained on a database of human ratings of distorted image quality. On the other hand, we find that simple models of early visual processing, incorporating one or more stages of local gain control, trained on the same database of distortion ratings, provide substantially better predictions of human sensitivity than either the CNN, or any combination of layers of VGG16.


Optimal integration of visual speed across different spatiotemporal frequency channels

Neural Information Processing Systems

How does the human visual system compute the speed of a coherent motion stimulus that contains motion energy in different spatiotemporal frequency bands? Here we propose that perceived speed is the result of optimal integration of speed information from independent spatiotemporal frequency tuned channels. We formalize this hypothesis with a Bayesian observer model that treats the channel activity as independent cues, which are optimally combined with a prior expectation for slow speeds. We test the model against behavioral data from a 2AFC speed discrimination task with which we measured subjects' perceived speed of drifting sinusoidal gratings with different contrasts and spatial frequencies, and of various combinations of these single gratings. We find that perceived speed of the combined stimuli is independent of the relative phase of the underlying grating components, and that the perceptual biases and discrimination thresholds are always smaller for the combined stimuli, supporting the cue combination hypothesis. The proposed Bayesian model fits the data well, accounting for perceptual biases and thresholds of both simple and combined stimuli. Fits are improved if we assume that the channel responses are subject to divisive normalization, which is in line with physiological evidence. Our results provide an important step toward a more complete model of visual motion perception that can predict perceived speeds for stimuli of arbitrary spatial structure.