Technology
Viewpoint Invariant Face Recognition using Independent Component Analysis and Attractor Networks
Bartlett, Marian Stewart, Sejnowski, Terrence J.
We have explored two approaches to recogmzmg faces across changes in pose. First, we developed a representation of face images based on independent component analysis (ICA) and compared it to a principal component analysis (PCA) representation for face recognition. The ICA basis vectors for this data set were more spatially local than the PCA basis vectors and the ICA representation had greater invariance to changes in pose. Second, we present a model for the development of viewpoint invariant responses to faces from visual experience in a biological system. The temporal continuity of natural visual experience was incorporated into an attractor network model by Hebbian learning following a lowpass temporal filter on unit activities.
Effective Training of a Neural Network Character Classifier for Word Recognition
Yaeger, Larry S., Lyon, Richard F., Webb, Brandyn J.
We have been conducting research on bottom-up classification techniques ba;ed on trainable artificial neural networks (ANNs), in combination with comprehensive but weakly-applied language models. To focus our work on a subproblem that is tractable enough to le.:'ld to usable products in a reasonable time, we have restricted the domain to hand-printing, so that strokes are clearly delineated by pen lifts. In the process of optimizing overall performance of the recognizer, we have discovered some useful techniques for architecting and training ANNs that must participate in a larger recognition process. Some of these techniques-especially the normalization of output error, frequency balanCing, and error emphal;is-suggest a common theme of significant value derived by reducing the effect of a priori biases in training data to better represent low frequency, low probability smnples, including second and third choice probabilities. There is mnple prior work in combining low-level classifiers with various search strategies to provide integrated segmentation and recognition for writing (Tappert et al 1990) and speech (Renals et aI1992). And there is a rich background in the use of ANNs a-; classifiers, including their use as a low-level, character classifier in a higher-level word recognition system (Bengio et aI1995).
Ensemble Methods for Phoneme Classification
Waterhouse, Steve R., Cook, Gary
There is now considerable interest in using ensembles or committees of learning machines to improve the performance of the system over that of a single learning machine. In most neural network ensembles, the ensemble members are trained on either the same data (Hansen & Salamon 1990) or different subsets of the data (Perrone & Cooper 1993). The ensemble members typically have different initial conditions and/or different architectures. The subsets of the data may be chosen at random, with prior knowledge or by some principled approach e.g.
Dual Kalman Filtering Methods for Nonlinear Prediction, Smoothing and Estimation
Prediction, estimation, and smoothing are fundamental to signal processing. To perform these interrelated tasks given noisy data, we form a time series model of the process that generates the data. Taking noise in the system explicitly into account, maximumlikelihood and Kalman frameworks are discussed which involve the dual process of estimating both the model parameters and the underlying state of the system. We review several established methods in the linear case, and propose severa!
A Constructive Learning Algorithm for Discriminant Tangent Models
Sona, Diego, Sperduti, Alessandro, Starita, Antonina
To reduce the computational complexity of classification systems using tangent distance, Hastie et al. (HSS) developed an algorithm to devise rich models for representing large subsets of the data which computes automatically the "best" associated tangent subspace. Schwenk & Milgram proposed a discriminant modular classification system (Diabolo) based on several autoassociative multilayer perceptrons which use tangent distance as error reconstruction measure. We propose a gradient based constructive learning algorithm for building a tangent subspace model with discriminant capabilities which combines several of the the advantages of both HSS and Diabolo: devised tangent models hold discriminant capabilities, space requirements are improved with respect to HSS since our algorithm is discriminant and thus it needs fewer prototype models, dimension of the tangent subspace is determined automatically by the constructive algorithm, and our algorithm is able to learn new transformations.
Neural Network Modeling of Speech and Music Signals
Time series prediction is one of the major applications of neural networks. After a short introduction into the basic theoretical foundations we argue that the iterated prediction of a dynamical system may be interpreted as a model of the system dynamics. By means of RBF neural networks we describe a modeling approach and extend it to be able to model instationary systems. As a practical test for the capabilities of the method we investigate the modeling of musical and speech signals and demonstrate that the model may be used for synthesis of musical and speech signals.
A New Approach to Hybrid HMM/ANN Speech Recognition using Mutual Information Neural Networks
Rigoll, Gerhard, Neukirchen, Christoph
This paper presents a new approach to speech recognition with hybrid HMM/ANN technology. While the standard approach to hybrid HMMI ANN systems is based on the use of neural networks as posterior probability estimators, the new approach is based on the use of mutual information neural networks trained with a special learning algorithm in order to maximize the mutual information between the input classes of the network and its resulting sequence of firing output neurons during training. It is shown in this paper that such a neural network is an optimal neural vector quantizer for a discrete hidden Markov model system trained on Maximum Likelihood principles. One of the main advantages of this approach is the fact, that such neural networks can be easily combined with HMM's of any complexity with context-dependent capabilities. It is shown that the resulting hybrid system achieves very high recognition rates, which are now already on the same level as the best conventional HMM systems with continuous parameters, and the capabilities of the mutual information neural networks are not yet entirely exploited.
A Constructive RBF Network for Writer Adaptation
This paper discusses a fairly general adaptation algorithm which augments a standard neural network to increase its recognition accuracy for a specific user. The basis for the algorithm is that the output of a neural network is characteristic of the input, even when the output is incorrect. We exploit this characteristic output by using an Output Adaptation Module (OAM) which maps this output into the correct user-dependent confidence vector. The OAM is a simplified Resource Allocating Network which constructs radial basis functions online. We applied the OAM to construct a writer-adaptive character recognition system for online handprinted characters.
Blind Separation of Delayed and Convolved Sources
Lee, Te-Won, Bell, Anthony J., Lambert, Russell H.
We address the difficult problem of separating multiple speakers with multiple microphones in a real room. We combine the work of Torkkola and Amari, Cichocki and Yang, to give Natural Gradient information maximisation rules for recurrent (IIR) networks, blindly adjusting delays, separating and deconvolving mixed signals. While they work well on simulated data, these rules fail in real rooms which usually involve non-minimum phase transfer functions, not-invertible using stable IIR filters. An approach that sidesteps this problem is to perform infomax on a feedforward architecture in the frequency domain (Lambert 1996). We demonstrate real-room separation of two natural signals using this approach.
Dynamic Features for Visual Speechreading: A Systematic Comparison
Gray, Michael S., Movellan, Javier R., Sejnowski, Terrence J.
Humans use visual as well as auditory speech signals to recognize spoken words. A variety of systems have been investigated for performing this task. The main purpose of this research was to systematically compare the performance of a range of dynamic visual features on a speechreading task. We have found that normalization of images to eliminate variation due to translation, scale, and planar rotation yielded substantial improvements in generalization performance regardless of the visual representation used. In addition, the dynamic information in the difference between successive frames yielded better performance than optical-flow based approaches, and compression by local low-pass filtering worked surprisingly better than global principal components analysis (PCA). These results are examined and possible explanations are explored.