Asia
3D Object Recognition Using Unsupervised Feature Extraction
Intrator, Nathan, Gold, Joshua I., Bülthoff, Heinrich H., Edelman, Shimon
Intrator (1990) proposed a feature extraction method that is related to recent statistical theory (Huber, 1985; Friedman, 1987), and is based on a biologically motivated model of neuronal plasticity (Bienenstock et al., 1982). This method has been recently applied to feature extraction in the context of recognizing 3D objects from single 2D views (Intrator and Gold, 1991). Here we describe experiments designed to analyze the nature of the extracted features, and their relevance to the theory and psychophysics of object recognition. 1 Introduction Results of recent computational studies of visual recognition (e.g., Poggio and Edelman, 1990) indicate that the problem of recognition of 3D objects can be effectively reformulated in terms of standard pattern classification theory. According to this approach, an object is represented by a few of its 2D views, encoded as clusters in multidimentional space. Recognition of a novel view is then carried out by interpo-460 3D Object Recognition Using Unsupervised Feature Extraction 461 lating among the stored views in the representation space.
Linear Operator for Object Recognition
Visual object recognition involves the identification of images of 3-D objects seen from arbitrary viewpoints. We suggest an approach to object recognition in which a view is represented as a collection of points given by their location in the image. An object is modeled by a set of 2-D views together with the correspondence between the views. We show that any novel view of the object can be expressed as a linear combination of the stored views. Consequently, we build a linear operator that distinguishes between views of a specific object and views of other objects.
Learning to Segment Images Using Dynamic Feature Binding
Mozer, Michael C., Zemel, Richard S., Behrmann, Marlene
Despite the fact that complex visual scenes contain multiple, overlapping objects, people perform object recognition with ease and accuracy. One operation that facilitates recognition is an early segmentation process in which features of objects are grouped and labeled according to which object they belong. Current computational systems that perform this operation are based on predefined grouping heuristics.
Markov Random Fields Can Bridge Levels of Abstraction
Cooper, Paul R., Prokopowicz, Peter N.
Network vision systems must make inferences from evidential information across levels of representational abstraction, from low level invariants, through intermediate scene segments, to high level behaviorally relevant object descriptions. This paper shows that such networks can be realized as Markov Random Fields (MRFs). We show first how to construct an MRF functionally equivalent to a Hough transform parameter network, thus establishing a principled probabilistic basis for visual networks. Second, we show that these MRF parameter networks are more capable and flexible than traditional methods. In particular, they have a well-defined probabilistic interpretation, intrinsically incorporate feedback, and offer richer representations and decision capabilities.
Learning to Make Coherent Predictions in Domains with Discontinuities
Becker, Suzanna, Hinton, Geoffrey E.
We have previously described an unsupervised learning procedure that discovers spatially coherent propertit _; of the world by maximizing the information that parameters extracted from different parts of the sensory input convey about some common underlying cause. When given random dot stereograms of curved surfaces, this procedure learns to extract surface depth because that is the property that is coherent across space. It also learns how to interpolate the depth at one location from the depths at nearby locations (Becker and Hint.oll.
Information Processing to Create Eye Movements
Because eye muscles never cocontract and do not deal with external loads, one can write an equation that relates motoneuron firing rate to eye position and velocity - a very uncommon situation in the CNS. The semicircular canals transduce head velocity in a linear manner by using a high background discharge rate, imparting linearity to the premotor circuits that generate eye movements. This has allowed deducing some of the signal processing involved, including a neural network that integrates. These ideas are often summarized by block diagrams. Unfortunately, they are of little value in describing the behavior of single neurons - a fmding supported by neural network models.
Operators and curried functions: Training and analysis of simple recurrent networks
Wiles, Janet, Bloesch, Anthony
We present a framework for programming tbe bidden unit representations of simple recurrent networks based on the use of hint units (additional targets at the output layer). We present two ways of analysing a network trained within this framework: Input patterns act as operators on the information encoded by the context units; symmetrically, patterns of activation over tbe context units act as curried functions of the input sequences. Simulations demonstrate that a network can learn to represent three different functions simultaneously and canonical discriminant analysis is used to investigate bow operators and curried functions are represented in the space of bidden unit activations.
Induction of Multiscale Temporal Structure
Learning structure in temporally-extended sequences is a difficult computational problem because only a fraction of the relevant information is available at any instant. Although variants of back propagation can in principle be used to find structure in sequences, in practice they are not sufficiently powerful to discover arbitrary contingencies, especially those spanning long temporal intervals or involving high order statistics. For example, in designing a connectionist network for music composition, we have encountered the problem that the net is able to learn musical structure that occurs locally in time-e.g., relations among notes within a musical phrase-but not structure that occurs over longer time periods--e.g., relations among phrases. To address this problem, we require a means of constructing a reduced deacription of the sequence that makes global aspects more explicit or more readily detectable. I propose to achieve this using hidden units that operate with different time constants.
The Efficient Learning of Multiple Task Sequences
I present a modular network architecture and a learning algorithm based on incremental dynamic programming that allows a single learning agent to learn to solve multiple Markovian decision tasks (MDTs) with significant transfer of learning across the tasks. I consider a class of MDTs, called composite tasks, formed by temporally concatenating a number of simpler, elemental MDTs. The architecture is trained on a set of composite and elemental MDTs. The temporal structure of a composite task is assumed to be unknown and the architecture learns to produce a temporal decomposition. It is shown that under certain conditions the solution of a composite MDT can be constructed by computationally inexpensive modifications of the solutions of its constituent elemental MDTs. 1 INTRODUCTION Most applications of domain independent learning algorithms have focussed on learning single tasks. Building more sophisticated learning agents that operate in complex environments will require handling multiple tasks/goals (Singh, 1992). Research effort on the scaling problem has concentrated on discovering faster learning algorithms, and while that will certainly help, techniques that allow transfer of learning across tasks will be indispensable for building autonomous learning agents that have to learn to solve multiple tasks. In this paper I consider a learning agent that interacts with an external, finite-state, discrete-time, stochastic dynamical environment and faces multiple sequences of Markovian decision tasks (MDTs).
Generalization Performance in PARSEC - A Structured Connectionist Parsing Architecture
This paper presents PARSECa system for generating connectionist parsing networks from example parses. PARSEC is not based on formal grammar systems and is geared toward spoken language tasks. PARSEC networks exhibit three strengths important for application to speech processing: 1) they learn to parse, and generalize well compared to handcoded grammars; 2) they tolerate several types of noise; 3) they can learn to use multi-modal input. Presented are the PARSEC architecture and performance analyses along several dimensions that demonstrate PARSEC's features. PARSEC's performance is compared to that of traditional grammar-based parsing systems.