Technology
Discovering Viewpoint-Invariant Relationships That Characterize Objects
Zemel, Richard S., Hinton, Geoffrey E.
Richard S. Zemel and Geoffrey E. Hinton Department of Computer Science University of Toronto Toronto, ONT M5S lA4 Abstract Using an unsupervised learning procedure, a network is trained on an ensemble of images of the same two-dimensional object at different positions, orientations and sizes. Each half of the network "sees" one fragment of the object, and tries to produce as output a set of 4 parameters that have high mutual information with the 4 parameters output by the other half of the network. Given the ensemble of training patterns, the 4 parameters on which the two halves of the network can agree are the position, orientation, and size of the whole object, or some recoding of them. After training, the network can reject instances of other shapes by using the fact that the predictions made by its two halves disagree. If two competing networks are trained on an unlabelled mixture of images of two objects, they cluster the training cases on the basis of the objects' shapes, independently of the position, orientation, and size. 1 INTRODUCTION A difficult problem for neural networks is to recognize objects independently of their position, orientation, or size.
Applications of Neural Networks in Video Signal Processing
Pearson, John C., Spence, Clay D., Sverdlove, Ronald
Although color TV is an established technology, there are a number of longstanding problems for which neural networks may be suited. Impulse noise is such a problem, and a modular neural network approach is presented in this paper. The training and analysis was done on conventional computers, while real-time simulations were performed on a massively parallel computer called the Princeton Engine. The network approach was compared to a conventional alternative, a median filter. Real-time simulations and quantitative analysis demonstrated the technical superiority of the neural system. Ongoing work is investigating the complexity and cost of implementing this system in hardware.
Natural Dolphin Echo Recognition Using an Integrator Gateway Network
Roitblat, Herbert L., Moore, Patrick W. B., Nachtigall, Paul E., Penner, Ralph H.
We have been studying the performance of a bottlenosed dolphin on a delayed matching-to-sample task to gain insight into the processes and mechanisms that the animal uses during echolocation. The dolphin recognizes targets by emitting natural sonar signals and listening to the echoes that return. This paper describes a novel neural network architecture, called an integrator gateway network, that we have developed to account for this performance. The integrator gateway network combines information from multiple echoes to classify targets with about 90% accuracy. In contrast, a standard backpropagation network performed with only about 63% accuracy.
Speech Recognition Using Connectionist Approaches
This paper is a summary of SPRINT project aims and results. The project focus on the use of neuro-computing techniques to tackle various problems that remain unsolved in speech recognition. First results concern the use of feedforward nets for phonetic units classification, isolated word recognition, and speaker adaptation.
From Speech Recognition to Spoken Language Understanding: The Development of the MIT SUMMIT and VOYAGER Systems
Zue, Victor, Glass, James, Goodine, David, Hirschman, Lynette, Leung, Hong, Phillips, Michael, Polifroni, Joseph, Seneff, Stephanie
Spoken input to computers, however, has yet to pass the threshold of practicality. Despite some recent successful demonstrations, current speech recognition systems typically fall far short of human capabilities of continuous speech recognition with essentially unrestricted vocabulary and speakers, under adverse acoustic environments.
Phonetic Classification and Recognition Using the Multi-Layer Perceptron
Leung, Hong C., Glass, James R., Phillips, Michael S., Zue, Victor W.
In this paper, we will describe several extensions to our earlier work, utilizing a segment-based approach. We will formulate our segmental framework and report our study on the use of multi-layer perceptrons for detection and classification of phonemes. We will also examine the outputs of the network, and compare the network performance with other classifiers. Our investigation is performed within a set of experiments that attempts to recognize 38 vowels and consonants in American English independent of speaker.
Exploratory Feature Extraction in Speech Signals
A novel unsupervised neural network for dimensionality reduction which seeks directions emphasizing multimodality is presented, and its connection to exploratory projection pursuit methods is discussed. This leads to a new statistical insight to the synaptic modification equations governing learning in Bienenstock, Cooper, and Munro (BCM) neurons (1982). The importance of a dimensionality reduction principle based solely on distinguishing features, is demonstrated using a linguistically motivated phoneme recognition experiment, and compared with feature extraction using back-propagation network. 1 Introduction Due to the curse of dimensionality (Bellman, 1961) it is desirable to extract features from a high dimensional data space before attempting a classification. How to perform this feature extraction/dimensionality reduction is not that clear. A first simplification is to consider only features defined by linear (or semi-linear) projections of high dimensional data. This class of features is used in projection pursuit methods (see review in Huber, 1985). Even after this simplification, it is still difficult to characterize what interesting projections are, although it is easy to point at projections that are uninteresting. A statement that has recently been made precise by Diaconis and Freedman (1984) says that for most high-dimensional clouds, most low-dimensional projections are approximately normal. This finding suggests that the important information in the data is conveyed in those directions whose single dimensional projected distribution is far from Gaussian, especially at the center of the distribution.