Country
Modeling Saccadic Targeting in Visual Search
Rao, Rajesh P. N., Zelinsky, Gregory J., Hayhoe, Mary M., Ballard, Dana H.
Visual cognition depends criticalIy on the ability to make rapid eye movements known as saccades that orient the fovea over targets of interest in a visual scene. Saccades are known to be ballistic: the pattern of muscle activation for foveating a prespecified target location is computed prior to the movement and visual feedback is precluded. Despite these distinctive properties, there has been no general model of the saccadic targeting strategy employed by the human visual system during visual search in natural scenes. This paper proposes a model for saccadic targeting that uses iconic scene representations derived from oriented spatial filters at multiple scales. Visual search proceeds in a coarse-to-fine fashion with the largest scale filter responses being compared first. The model was empirically tested by comparing its perfonnance with actual eye movement data from human subjects in a natural visual search task; preliminary results indicate substantial agreement between eye movements predicted by the model and those recorded from human subjects.
Classifying Facial Action
Bartlett, Marian Stewart, Viola, Paul A., Sejnowski, Terrence J., Golomb, Beatrice A., Larsen, Jan, Hager, Joseph C., Ekman, Paul
The Facial Action Coding System, (FACS), devised by Ekman and Friesen (1978), provides an objective meanS for measuring the facial muscle contractions involved in a facial expression. In this paper, we approach automated facial expression analysis by detecting and classifying facial actions. We generated a database of over 1100 image sequences of 24 subjects performing over 150 distinct facial actions or action combinations. We compare three different approaches to classifying the facial actions in these images: Holistic spatial analysis based on principal components of graylevel images; explicit measurement of local image features such as wrinkles; and template matching with motion flow fields. On a dataset containing six individual actions and 20 subjects, these methods had 89%, 57%, and 85% performances respectively for generalization to novel subjects. When combined, performance improved to 92%.
Learning to Predict Visibility and Invisibility from Occlusion Events
Marshall, Jonathan A., Alley, Richard K., Hubbard, Robert S.
This paper presents a self-organizing neural network that learns to detect, represent, and predict the visibility and invisibility relationships that arise during occlusion events, after a period of exposure to motion sequences containing occlusion and disocclusion events. The network develops two parallel opponent channels or "chains" of lateral excitatory connections for every resolvable motion trajectory. One channel, the "On" chain or "visible" chain, is activated when a moving stimulus is visible. The other channel, the "Off" chain or "invisible" chain, carries a persistent, amodal representation that predicts the motion of a formerly visible stimulus that becomes invisible due to occlusion. The learning rule uses disinhibition from the On chain to trigger learning in the Off chain.
Unsupervised Pixel-prediction
When a sensory system constructs a model of the environment from its input, it might need to verify the model's accuracy. One method of verification is multivariate time-series prediction: a good model could predict the near-future activity of its inputs, much as a good scientific theory predicts future data. Such a predicting model would require copious top-down connections to compare the predictions with the input. That feedback could improve the model's performance in two ways: by biasing internal activity toward expected patterns, and by generating specific error signals if the predictions fail. A proof-of-concept model-an event-driven, computationally efficient layered network, incorporating "cortical" features like all-excitatory synapses and local inhibition-was constructed to make near-future predictions of a simple, moving stimulus. After unsupervised learning, the network contained units not only tuned to obvious features of the stimulus like contour orientation and motion, but also to contour discontinuity ("end-stopping") and illusory contours.
Control of Selective Visual Attention: Modeling the "Where" Pathway
Intermediate and higher vision processes require selection of a subset of the available sensory information before further processing. Usually, this selection is implemented in the form of a spatially circumscribed region of the visual field, the so-called "focus of attention" which scans the visual scene dependent on the input and on the attentional state of the subject. We here present a model for the control of the focus of attention in primates, based on a saliency map. This mechanism is not only expected to model the functionality of biological vision but also to be essential for the understanding of complex scenes in machine vision.
A Framework for Non-rigid Matching and Correspondence
Pappu, Suguna, Gold, Steven, Rangarajan, Anand
Matching feature point sets lies at the core of many approaches to object recognition. We present a framework for nonrigid matching that begins with a skeleton module, affine point matching, and then integrates multiple features to improve correspondence and develops an object representation based on spatial regions to model local transformations.
The Gamma MLP for Speech Phoneme Recognition
Lawrence, Steve, Tsoi, Ah Chung, Back, Andrew D.
We define a Gamma multi-layer perceptron (MLP) as an MLP with the usual synaptic weights replaced by gamma filters (as proposed by de Vries and Principe (de Vries and Principe, 1992)) and associated gain terms throughout all layers. We derive gradient descent update equations and apply the model to the recognition of speech phonemes. We find that both the inclusion of gamma filters in all layers, and the inclusion of synaptic gains, improves the performance of the Gamma MLP. We compare the Gamma MLP with TDNN, Back-Tsoi FIR MLP, and Back-Tsoi I1R MLP architectures, and a local approximation scheme. We find that the Gamma MLP results in an substantial reduction in error rates.
KODAK lMAGELINK™ OCR Alphanumeric Handprint Module
Shustorovich, Alexander, Thrasher, Christopher W.
There are two neural network algorithms at its cme: the first network is trained to find individual characters in an alphamuneric field, while the second one perfmns the classification. Both networks were trained on Gabor projections of the ociginal pixel images, which resulted in higher recognition rates and greater noise immunity. Compared to its purely numeric counterpart (Shusurovich and Thrasher, 1995), this version of the system has a significant applicatim specific postprocessing module. The system has been implemented in specialized parallel hardware, which allows it to run at 80 char/sec/board. It has been installed at the Driver and Vehicle Licensing Agency (DVLA) in the United Kingdom.
Selective Attention for Handwritten Digit Recognition
Completely parallel object recognition is NPcomplete. Achieving a recognizer with feasible complexity requires a compromise between parallel and sequential processing where a system selectively focuses on parts of a given image, one after another. Successive fixations are generated to sample the image and these samples are processed and abstracted to generate a temporal context in which results are integrated over time. A computational model based on a partially recurrent feedforward network is proposed and made credible by testing on the real-world problem of recognition of handwritten digits with encouraging results.
Handwritten Word Recognition using Contextual Hybrid Radial Basis Function Network/Hidden Markov Models
Lemarié, Bernard, Gilloux, Michel, Leroux, Manuel
A hybrid and contextual radial basis function networklhidden Markov model off-line handwritten word recognition system is presented. The task assigned to the radial basis function networks is the estimation of emission probabilities associated to Markov states. The model is contextual because the estimation of emission probabilities takes into account the left context of the current image segment as represented by its predecessor in the sequence. The new system does not outperform the previous system without context but acts differently.