Country
A Self-Organizing Integrated Segmentation and Recognition Neural Net
Keeler, Jim, Rumelhart, David E.
Standard pattern recognition systems usually involve a segmentation step prior to the recognition step. For example, it is very common in character recognition to segment characters in a pre-processing step then normalize the individual characters and pass them to a recognition engine such as a neural network, as in the work of LeCun et al. 1988, Martin and Pittman (1988). This separation between segmentation and recognition becomes unreliable if the characters are touching each other, touching bounding boxes, broken, or noisy. Other applications such as scene analysis or continuous speech recognition pose similar and more severe segmentation problems. The difficulties encountered in these applications present an apparent dilemma: one cannot recognize the patterns 496 *keeler@mcc.comReprint
Multi-Digit Recognition Using a Space Displacement Neural Network
Matan, Ofer, Burges, Christopher J. C., LeCun, Yann, Denker, John S.
Ofer Matan*, Christopher J.C. Burges, Yann Le Cun and John S. Denker AT&T Bell Laboratories, Holmdel, N. J. 07733 Abstract We present a feed-forward network architecture for recognizing an unconstrained handwrittenmulti-digit string. This is an extension of previous work on recognizing isolated digits. The output layer of the network is coupled to a Viterbi alignment module that chooses the best interpretation of the input. Training errors are propagated through the Viterbi module. The novelty in this procedure is that segmentation is done on the feature maps developed in the Space Displacement Neural Network (SDNN) rather than the input (pixel) space. 1 Introduction In previous work (Le Cun et al., 1990) we have demonstrated a feed-forward backpropagation networkthat recognizes isolated handwritten digits at state-of-the-art performance levels.
3D Object Recognition Using Unsupervised Feature Extraction
Intrator, Nathan, Gold, Joshua I., Bรผlthoff, Heinrich H., Edelman, Shimon
Gold Center for Neural Science, Brown University Providence, RI 02912, USA Shimon Edelman Dept. of Applied Mathematics and Computer Science, Weizmann Institute of Science, Rehovot 76100, Israel Abstract Intrator (1990) proposed a feature extraction method that is related to recent statistical theory (Huber, 1985; Friedman, 1987), and is based on a biologically motivated model of neuronal plasticity (Bienenstock et al., 1982). This method has been recently applied to feature extraction in the context of recognizing 3D objects from single 2D views (Intrator and Gold, 1991). Here we describe experiments designed to analyze the nature of the extracted features, and their relevance to the theory and psychophysics of object recognition. 1 Introduction Results of recent computational studies of visual recognition (e.g., Poggio and Edelman, 1990)indicate that the problem of recognition of 3D objects can be effectively reformulated in terms of standard pattern classification theory. According to this approach, an object is represented by a few of its 2D views, encoded as clusters in multidimentional space. Recognition of a novel view is then carried out by interpo-460 3D Object Recognition Using Unsupervised Feature Extraction 461 lating among the stored views in the representation space.
Linear Operator for Object Recognition
Visual object recognition involves the identification of images of 3-D objects seenfrom arbitrary viewpoints. We suggest an approach to object recognition in which a view is represented as a collection of points given by their location in the image. An object is modeled by a set of 2-D views together with the correspondence between the views. We show that any novel view of the object can be expressed as a linear combination of the stored views. Consequently, we build a linear operator that distinguishes between views of a specific object and views of other objects.
Combined Neural Network and Rule-Based Framework for Probabilistic Pattern Recognition and Discovery
Greenspan, Hayit K., Goodman, Rodney, Chellappa, Rama
A combined neural network and rule-based approach is suggested as a general framework for pattern recognition. This approach enables unsupervised andsupervised learning, respectively, while providing probability estimates for the output classes. The probability maps are utilized for higher level analysis such as a feedback for smoothing over the output label mapsand the identification of unknown patterns (pattern "discovery"). The suggested approach is presented and demonstrated in the texture - analysis task. A correct classification rate in the 90 percentile is achieved for both unstructured and structured natural texture mosaics. The advantages ofthe probabilistic approach to pattern analysis are demonstrated.
Learning to Segment Images Using Dynamic Feature Binding
Mozer, Michael C., Zemel, Richard S., Behrmann, Marlene
Despite the fact that complex visual scenes contain multiple, overlapping objects, people perform object recognition with ease and accuracy. One operation that facilitates recognition is an early segmentation process in which features of objects are grouped and labeled according to which object theybelong. Current computational systems that perform this operation arebased on predefined grouping heuristics.
Illumination and View Position in 3D Visual Recognition
It is shown that both changes in viewing position and illumination conditions canbe compensated for, prior to recognition, using combinations of images taken from different viewing positions and different illumination conditions.It is also shown that, in agreement with psychophysical findings, the computation requires at least a sign-bit image as input - contours alone are not sufficient. 1 Introduction The task of visual recognition is natural and effortless for biological systems, yet the problem of recognition has been proven to be very difficult to analyze from a computational point of view. The fundamental reason is that novel images of familiar objects are often not sufficiently similar to previously seen images of that object. Assuming a rigid and isolated object in the scene, there are two major sources for this variability: geometric and photometric. The geometric source of variability comes from changes of view position. A 3D object can be viewed from a variety of directions, each resulting with a different 2D projection. The difference is significant, even for modest changes in viewing positions, and can be demonstrated by superimposing those projections (see Figure 1, first row second image). Much attention has been given to this problem in the visual recognition literature ([9], and references therein), and recent results show that one can compensate for changes in viewing position by generating novel views from a small number of model views of the object [10, 4, 8].
Markov Random Fields Can Bridge Levels of Abstraction
Cooper, Paul R., Prokopowicz, Peter N.
Network vision systems must make inferences from evidential information acrosslevels of representational abstraction, from low level invariants, through intermediate scene segments, to high level behaviorally relevant object descriptions. This paper shows that such networks can be realized as Markov Random Fields (MRFs). We show first how to construct an MRF functionally equivalent to a Hough transform parameter network, thus establishing a principled probabilistic basis for visual networks. Second, weshow that these MRF parameter networks are more capable and flexible than traditional methods. In particular, they have a well-defined probabilistic interpretation, intrinsically incorporate feedback, and offer richer representations and decision capabilities.