Country
Figure of Merit Training for Detection and Spotting
Chang, Eric I., Lippmann, Richard P.
Spotting tasks require detection of target patterns from a background of richly varied non-target inputs. The performance measure of interest for these tasks, called the figure of merit (FOM), is the detection rate for target patterns when the false alarm rate is in an acceptable range. A new approach to training spotters is presented which computes the FOM gradient for each input pattern and then directly maximizes the FOM using b ackpropagati on. This eliminates the need for thresholds during training. It also uses network resources to model Bayesian a posteriori probability functions accurately only for patterns which have a significant effect on the detection accuracy over the false alarm rate of interest. FOM training increased detection accuracy by 5 percentage points for a hybrid radial basis function (RBF) - hidden Markov model (HMM) wordspotter on the credit-card speech corpus.
Analysis of Short Term Memories for Neural Networks
Principe, Jose C., Hsu, Hui-H., Kuo, Jyh-Ming
Time varying signals, natural or man made, carry information in their time structure. The problem is then one of devising methods and topologies (in the case of interest here, neural topologies) that explore information along time.This problem can be appropriately called temporal pattern recognition, as opposed to the more traditional case of static pattern recognition. In static pattern recognition an input is represented by a point in a space with dimensionality given by the number of signal features, while in temporal pattern recognition the inputs are sequence of features. These sequence of features can also be thought as a point but in a vector space of increasing dimensionality. Fortunately the recent history of the input signal is the one that bears more information to the decision making, so the effective dimensionality is finite but very large and unspecified a priori.
Bayesian Self-Organization
Yuille, Alan L., Smirnakis, Stelios M., Xu, Lei
Recent work by Becker and Hinton (Becker and Hinton, 1992) shows a promising mechanism, based on maximizing mutual information assuming spatial coherence, by which a system can selforganize itself to learn visual abilities such as binocular stereo. We introduce a more general criterion, based on Bayesian probability theory, and thereby demonstrate a connection to Bayesian theories of visual perception and to other organization principles for early vision (Atick and Redlich, 1990). Methods for implementation using variants of stochastic learning are described and, for the special case of linear filtering, we derive an analytic expression for the output. 1 Introduction The input intensity patterns received by the human visual system are typically complicated functions of the object surfaces and light sources in the world. It *Lei Xu was a research scholar in the Division of Applied Sciences at Harvard University while this work was performed. Thus the visual system must be able to extract information from the input intensities that is relatively independent of the actual intensity values.
Dual Mechanisms for Neural Binding and Segmentation
We propose that the binding and segmentation of visual features is mediated by two complementary mechanisms; a low resolution, spatial-based, resource-free process and a high resolution, temporal-based, resource-limited process. In the visual cortex, the former depends upon the orderly topographic organization in striate and extrastriate areas while the latter may be related to observed temporal relationships between neuronal activities. Computer simulations illustrate the role the two mechanisms play in figure/ ground discrimination, depth-from-occlusion, and the vividness of perceptual completion.
Feature Densities are Required for Computing Feature Correspondences
The feature correspondence problem is a classic hurdle in visual object-recognition concerned with determining the correct mapping between the features measured from the image and the features expected by the model. In this paper we show that determining good correspondences requires information about the joint probability density over the image features. We propose "likelihood based correspondence matching" as a general principle for selecting optimal correspondences. The approach is applicable to nonrigid models, allows nonlinear perspective transformations, and can optimally deal with occlusions and missing features.
A Network Mechanism for the Determination of Shape-From-Texture
We propose a computational model for how the cortex discriminates shape and depth from texture. The model consists of four stages: (1) extraction of local spatial frequency, (2) frequency characterization, (3) detection of texture compression by normalization, and (4) integration of the normalized frequency over space. The model accounts for a number of psychophysical observations including experiments based on novel random textures. These textures are generated from white noise and manipulated in Fourier domain in order to produce specific frequency spectra. Simulations with a range of stimuli, including real images, show qualitative and quantitative agreement with human perception. 1 INTRODUCTION There are several physical cues to shape and depth which arise from changes in projection as a surface curves away from view, or recedes in perspective.
Classifying Hand Gestures with a View-Based Distributed Representation
Darrell, Trevor J., Pentland, Alex P.
We present a method for learning, tracking, and recognizing human hand gestures recorded by a conventional CCD camera without any special gloves or other sensors. A view-based representation is used to model aspects of the hand relevant to the trained gestures, and is found using an unsupervised clustering technique. We use normalized correlation networks, with dynamic time warping in the temporal domain, as a distance function for unsupervised clustering. Views are computed separably for space and time dimensions; the distributed response of the combination of these units characterizes the input data with a low dimensional representation. A supervised classification stage uses labeled outputs of the spatiotemporal units as training data. Our system can correctly classify gestures in real time with a low-cost image processing accelerator.
Globally Trained Handwritten Word Recognizer using Spatial Representation, Convolutional Neural Networks, and Hidden Markov Models
Bengio, Yoshua, LeCun, Yann, Henderson, Donnie
We introduce a new approach for online recognition of handwritten words written in unconstrained mixed style. The preprocessor performs a word-level normalization by fitting a model of the word structure using the EM algorithm. Words are then coded into low resolution "annotated images" where each pixel contains information about trajectory direction and curvature. The recognizer is a convolution network which can be spatially replicated. From the network output, a hidden Markov model produces word scores. The entire system is globally trained to minimize word-level errors. 1 Introduction Natural handwriting is often a mixture of different "styles", lower case printed, upper case, and cursive.
Event-Driven Simulation of Networks of Spiking Neurons
A fast event-driven software simulator has been developed for simulating large networks of spiking neurons and synapses. The primitive network elements are designed to exhibit biologically realistic behaviors, such as spiking, refractoriness, adaptation, axonal delays, summation of post-synaptic current pulses, and tonic current inputs. The efficient event-driven representation allows large networks to be simulated in a fraction of the time that would be required for a full compartmental-model simulation. Corresponding analog CMOS VLSI circuit primitives have been designed and characterized, so that large-scale circuits may be simulated prior to fabrication. 1 Introduction Artificial neural networks typically use an abstraction of real neuron behaviour, in which the continuously varying mean firing rate of the neuron is presumed to carry the information about the neuron's time-varying state of excitation [1]. This useful simplification allows the neuron's state to be represented as a time-varying continuous-amplitude quantity.