Itti, Laurent
An Object-Based Bayesian Framework for Top-Down Visual Attention
Borji, Ali (University of Southern California) | Sihite, Dicky N. (University of Southern California) | Itti, Laurent (University of Southern California)
We introduce a new task-independent framework to model top-down overt visual attention based on graph-ical models for probabilistic inference and reasoning. We describe a Dynamic Bayesian Network (DBN) that infers probability distributions over attended objects and spatial locations directly from observed data. Probabilistic inference in our model is performed over object-related functions which are fed from manual annotations of objects in video scenes or by state-of-the-art object detection models. Evaluating over โผ3 hours (appx. 315,000 eye fixations and 12,600 saccades) of observers playing 3 video games (time-scheduling, driving, and flight combat), we show that our approach is significantly more predictive of eye fixations compared to: 1) simpler classifier-based models also developed here that map a signature of a scene (multi-modal information from gist, bottom-up saliency, physical actions, and events) to eye positions, 2) 14 state-of-the-art bottom-up saliency models, and 3) brute-force algorithms such as mean eye position. Our results show that the proposed model is more effective in employing and reasoning over spatio-temporal visual data.
Congruence between model and human attention reveals unique signatures of critical visual events
Peters, Robert, Itti, Laurent
Current computational models of bottom-up and top-down components of attention arepredictive of eye movements across a range of stimuli and of simple, fixed visual tasks (such as visual search for a target among distractors). However, todate there exists no computational framework which can reliably mimic human gaze behavior in more complex environments and tasks, such as driving a vehicle through traffic. Here, we develop a hybrid computational/behavioral framework, combining simple models for bottom-up salience and top-down relevance, andlooking for changes in the predictive power of these components at different critical event times during 4.7 hours (500,000 video frames) of observers playing car racing and flight combat video games. This approach is motivated by our observation that the predictive strengths of the salience and relevance models exhibitreliable temporal signatures during critical event windows in the task sequence--for example, when the game player directly engages an enemy plane in a flight combat game, the predictive strength of the salience model increases significantly, while that of the relevance model decreases significantly. Our new framework combines these temporal signatures to implement several event detectors. Critically,we find that an event detector based on fused behavioral and stimulus information (in the form of the model's predictive strength) is much stronger than detectors based on behavioral information alone (eye position) or image information alone(model prediction maps). This approach to event detection, based on eye tracking combined with computational models applied to the visual input, may have useful applications as a less-invasive alternative to other event detection approaches based on neural signatures derived from EEG or fMRI recordings.
Optimal cue selection strategy
Navalpakkam, Vidhya, Itti, Laurent
Bayesian Surprise Attracts Human Attention
Itti, Laurent, Baldi, Pierre F.
The concept of surprise is central to sensory processing, adaptation, learning, and attention. Yet, no widely-accepted mathematical theory currently exists to quantitatively characterize surprise elicited by a stimulus orevent, for observers that range from single neurons to complex natural or engineered systems.
Modeling the Modulatory Effect of Attention on Human Spatial Vision
Itti, Laurent, Braun, Jochen, Koch, Christof
We present new simulation results, in which a computational model of interacting visual neurons simultaneously predicts the modulation of spatial vision thresholds by focal visual attention, for five dual-task human psychophysics experiments. This new study complements our previous findings that attention activates a winnertake-all competition among early visual neurons within one cortical hypercolumn. This "intensified competition" hypothesis assumed that attention equally affects all neurons, and yielded two singleunit predictions: an increase in gain and a sharpening of tuning with attention. While both effects have been separately observed in electrophysiology, no single-unit study has yet shown them simultaneously. Hence, we here explore whether our model could still predict our data if attention might only modulate neuronal gain, but do so non-uniformly across neurons and tasks. Specifically, we investigate whether modulating the gain of only the neurons that are loudest, best-tuned, or most informative about the stimulus, or of all neurons equally but in a task-dependent manner, may account for the data. We find that none of these hypotheses yields predictions as plausible as the intensified competition hypothesis, hence providing additional support for our original findings.
Attentional Modulation of Human Pattern Discrimination Psychophysics Reproduced by a Quantitative Model
Itti, Laurent, Braun, Jochen, Lee, Dale K., Koch, Christof
We previously proposed a quantitative model of early visual processing inprimates, based on non-linearly interacting visual filters and statistically efficient decision. We now use this model to interpret theobserved modulation of a range of human psychophysical thresholds with and without focal visual attention. Our model - calibrated by an automatic fitting procedure - simultaneously reproduces thresholdsfor four classical pattern discrimination tasks, performed while attention was engaged by another concurrent task. Our model then predicts that the seemingly complex improvements of certain thresholds, which we observed when attention was fully available for the discrimination tasks, can best be explained by a strengthening of competition among early visual filters. 1 INTRODUCTION What happens when we voluntarily focus our attention to a restricted part of our visual field? We here investigate the possibility that attention might have a specific computational modulatory effect on early visual processing.
Attentional Modulation of Human Pattern Discrimination Psychophysics Reproduced by a Quantitative Model
Itti, Laurent, Braun, Jochen, Lee, Dale K., Koch, Christof
We previously proposed a quantitative model of early visual processing in primates, based on non-linearly interacting visual filters and statistically efficient decision. We now use this model to interpret the observed modulation of a range of human psychophysical thresholds with and without focal visual attention. Our model - calibrated by an automatic fitting procedure - simultaneously reproduces thresholds for four classical pattern discrimination tasks, performed while attention was engaged by another concurrent task. Our model then predicts that the seemingly complex improvements of certain thresholds, which we observed when attention was fully available for the discrimination tasks, can best be explained by a strengthening of competition among early visual filters. 1 INTRODUCTION What happens when we voluntarily focus our attention to a restricted part of our visual field? Focal attention is often thought as a gating mechanism, which selectively allows a certain spatial location and and certain types of visual features to reach higher visual processes.
A Model of Early Visual Processing
Itti, Laurent, Braun, Jochen, Lee, Dale K., Koch, Christof
We propose a model for early visual processing in primates. The model consists of a population of linear spatial filters which interact through nonlinear excitatory and inhibitory pooling. Statistical estimation theory is then used to derive human psychophysical thresholds from the responses of the entire population of units. The model is able to reproduce human thresholds for contrast and orientation discrimination tasks, and to predict contrast thresholds in the presence of masks of varying orientation and spatial frequency.
A Model of Early Visual Processing
Itti, Laurent, Braun, Jochen, Lee, Dale K., Koch, Christof
We propose a model for early visual processing in primates. The model consists of a population of linear spatial filters which interact throughnon-linear excitatory and inhibitory pooling. Statistical estimation theory is then used to derive human psychophysical thresholds from the responses of the entire population of units. The model is able to reproduce human thresholds for contrast and orientation discriminationtasks, and to predict contrast thresholds in the presence of masks of varying orientation and spatial frequency.