Goto

Collaborating Authors

 Mingolla, Ennio


Tracking objects that change in appearance with phase synchrony

arXiv.org Artificial Intelligence

Objects we encounter often change appearance as we interact with them. Changes in illumination (shadows), object pose, or movement of nonrigid objects can drastically alter available image features. How do biological visual systems track objects as they change? It may involve specific attentional mechanisms for reasoning about the locations of objects independently of their appearances -- a capability that prominent neuroscientific theories have associated with computing through neural synchrony. We computationally test the hypothesis that the implementation of visual attention through neural synchrony underlies the ability of biological visual systems to track objects that change in appearance over time. We first introduce a novel deep learning circuit that can learn to precisely control attention to features separately from their location in the world through neural synchrony: the complex-valued recurrent neural network (CV-RNN). Next, we compare object tracking in humans, the CV-RNN, and other deep neural networks (DNNs), using FeatureTracker: a large-scale challenge that asks observers to track objects as their locations and appearances change in precisely controlled ways. While humans effortlessly solved FeatureTracker, state-of-the-art DNNs did not. In contrast, our CV-RNN behaved similarly to humans on the challenge, providing a computational proof-of-concept for the role of phase synchronization as a neural substrate for tracking appearance-morphing objects as they move about.


Extreme Image Transformations Facilitate Robust Latent Object Representations

arXiv.org Artificial Intelligence

Adversarial attacks can affect the object recognition capabilities of machines in wild. These can often result from spurious correlations between input and class labels, and are prone to memorization in large networks. While networks are expected to do automated feature selection, it is not effective at the scale of the object. Humans, however, are able to select the minimum set of features required to form a robust representation of an object. In this work, we show that finetuning any pretrained off-the-shelf network with Extreme Image Transformations (EIT) not only helps in learning a robust latent representation, it also improves the performance of these networks against common adversarial attacks of various intensities. Our EIT trained networks show strong activations in the object regions even when tested with more intense noise, showing promising generalizations across different kinds of adversarial attacks.


Extreme Image Transformations Affect Humans and Machines Differently

arXiv.org Artificial Intelligence

Some recent artificial neural networks (ANNs) claim to model aspects of primate neural and human performance data. Their success in object recognition is, however, dependent on exploiting low-level features for solving visual tasks in a way that humans do not. As a result, out-of-distribution or adversarial input is often challenging for ANNs. Humans instead learn abstract patterns and are mostly unaffected by many extreme image distortions. We introduce a set of novel image transforms inspired by neurophysiological findings and evaluate humans and ANNs on an object recognition task. We show that machines perform better than humans for certain transforms and struggle to perform at par with humans on others that are easy for humans. We quantify the differences in accuracy for humans and machines and find a ranking of difficulty for our transforms for human data. We also suggest how certain characteristics of human visual processing can be adapted to improve the performance of ANNs for our difficult-for-machines transforms.


Tracking Without Re-recognition in Humans and Machines

arXiv.org Artificial Intelligence

Imagine trying to track one particular fruitfly in a swarm of hundreds. Higher biological visual systems have evolved to track moving objects by relying on both appearance and motion features. We investigate if state-of-the-art deep neural networks for visual tracking are capable of the same. For this, we introduce PathTracker, a synthetic visual challenge that asks human observers and machines to track a target object in the midst of identical-looking "distractor" objects. While humans effortlessly learn PathTracker and generalize to systematic variations in task design, state-of-the-art deep networks struggle. To address this limitation, we identify and model circuit mechanisms in biological brains that are implicated in tracking objects based on motion cues. When instantiated as a recurrent network, our circuit model learns to solve PathTracker with a robust visual strategy that rivals human performance and explains a significant proportion of their decision-making on the challenge. We also show that the success of this circuit model extends to object tracking in natural videos. Adding it to a transformer-based architecture for object tracking builds tolerance to visual nuisances that affect object appearance, resulting in a new state-of-the-art performance on the large-scale TrackingNet object tracking challenge. Our work highlights the importance of building artificial vision models that can help us better understand human vision and improve computer vision.


Neural Dynamics of Motion Segmentation and Grouping

Neural Information Processing Systems

A neural network model of motion segmentation by visual cortex is described. The model clarifies how preprocessing of motion signals by a Motion Oriented Contrast Filter (MOC Filter) is joined to long-range cooperative motion mechanisms in a motion Cooperative Competitive Loop (CC Loop) to control phenomena such as as induced motion, motion capture, and motion aftereffects. The total model system is a motion Boundary Contour System (BCS) that is computed in parallel with a static BCS before both systems cooperate to generate a boundary representation for three dimensional visual form perception. The present investigations clarify how the static BCS can be modified for use in motion segmentation problems, notably for analyzing how ambiguous local movements (the aperture problem) on a complex moving shape are suppressed and actively reorganized into a coherent global motion signal. 1 INTRODUCTION: WHY ARE STATIC AND MOTION BOUNDARY CONTOUR SYSTEMS NEEDED? Some regions, notably MT, of visual cortex are specialized for motion processing. However, even the earliest stages of visual cortex processing, such as simple cells in VI, require stimuli that change through time for their maximal activation and are direction-sensitive. Why has evolution generated regions such as MT, when even VI is change-sensitive and direction-sensitive? What computational properties are achieved by MT that are not already available in VI?


Neural Dynamics of Motion Segmentation and Grouping

Neural Information Processing Systems

A neural network model of motion segmentation by visual cortex is described. The model clarifies how preprocessing of motion signals by a Motion Oriented Contrast Filter (MOC Filter) is joined to long-range cooperative motion mechanisms in a motion Cooperative Competitive Loop (CC Loop) to control phenomena such as as induced motion, motion capture, and motion aftereffects. The total model system is a motion Boundary Contour System (BCS) that is computed in parallel with a static BCS before both systems cooperate to generate a boundary representation for three dimensional visual form perception. The present investigations clarify how the static BCS can be modified for use in motion segmentation problems, notably for analyzing how ambiguous local movements (the aperture problem) on a complex moving shape are suppressed and actively reorganized into a coherent global motion signal. 1 INTRODUCTION: WHY ARE STATIC AND MOTION BOUNDARY CONTOUR SYSTEMS NEEDED? Some regions, notably MT, of visual cortex are specialized for motion processing. However, even the earliest stages of visual cortex processing, such as simple cells in VI, require stimuli that change through time for their maximal activation and are direction-sensitive. Why has evolution generated regions such as MT, when even VI is change-sensitive and direction-sensitive? What computational properties are achieved by MT that are not already available in VI?


Neural Dynamics of Motion Segmentation and Grouping

Neural Information Processing Systems

A neural network model of motion segmentation by visual cortex is described. Themodel clarifies how preprocessing of motion signals by a Motion Oriented Contrast Filter (MOC Filter) is joined to long-range cooperative motionmechanisms in a motion Cooperative Competitive Loop (CC Loop) to control phenomena such as as induced motion, motion capture, andmotion aftereffects. The total model system is a motion Boundary Contour System (BCS) that is computed in parallel with a static BCS before both systems cooperate to generate a boundary representation for three dimensional visual form perception. The present investigations clarify howthe static BCS can be modified for use in motion segmentation problems, notablyfor analyzing how ambiguous local movements (the aperture problem) on a complex moving shape are suppressed and actively reorganized intoa coherent global motion signal. 1 INTRODUCTION: WHY ARE STATIC AND MOTION BOUNDARY CONTOUR SYSTEMS NEEDED? Some regions, notably MT, of visual cortex are specialized for motion processing.