Asia
Nonlinear Image Interpolation using Manifold Learning
Bregler, Christoph, Omohundro, Stephen M.
The problem of interpolating between specified images in an image sequence is a simple, but important task in model-based vision. We describe an approach based on the abstract task of "manifold learning" and present results on both synthetic and real image sequences. This problem arose in the development of a combined lipreading and speech recognition system.
Using a neural net to instantiate a deformable model
Williams, Christopher K. I., Revow, Michael, Hinton, Geoffrey E.
Deformable models are an attractive approach to recognizing nonrigid objects which have considerable within class variability. However, there are severe search problems associated with fitting the models to data. We show that by using neural networks to provide better starting points, the search time can be significantly reduced. The method is demonstrated on a character recognition task. In previous work we have developed an approach to handwritten character recognition based on the use of deformable models (Hinton, Williams and Revow, 1992a; Revow, Williams and Hinton, 1993). We have obtained good performance with this method, but a major problem is that the search procedure for fitting each model to an image is very computationally intensive, because there is no efficient algorithm (like dynamic programming) for this task. In this paper we demonstrate that it is possible to "compile down" some of the knowledge gained while fitting models to data to obtain better starting points that significantly reduce the search time. 1 DEFORMABLE MODELS FOR DIGIT RECOGNITION The basic idea in using deformable models for digit recognition is that each digit has a model, and a test image is classified by finding the model which is most likely to have generated it. The quality of the match between model and test image depends on the deformation of the model, the amount of ink that is attributed to noise and the distance of the remaining ink from the deformed model.
Unsupervised Classification of 3D Objects from 2D Views
Suzuki, Satoshi, Ando, Hiroshi
The human visual system can recognize various 3D (three-dimensional) objects from their 2D (two-dimensional) retinal images although the images vary significantly as the viewpoint changes. Recent computational models have explored how to learn to recognize 3D objects from their projected views (Poggio & Edelman, 1990). Most existing models are, however, based on supervised learning, i.e., during training the teacher tells which object each view belongs to. The model proposed by Weinshall et al. (1990) also requires a signal that segregates different objects during training. This paper, on the other hand, discusses unsupervised aspects of 3D object recognition where the system discovers categories by itself.
Associative Decorrelation Dynamics: A Theory of Self-Organization and Optimization in Feedback Networks
This paper outlines a dynamic theory of development and adaptation in neural networks with feedback connections. Given input ensemble, the connections change in strength according to an associative learning rule and approach a stable state where the neuronal outputs are decorrelated. We apply this theory to primary visual cortex and examine the implications of the dynamical decorrelation of the activities of orientation selective cells by the intracortical connections. The theory gives a unified and quantitative explanation of the psychophysical experiments on orientation contrast and orientation adaptation. Using only one parameter, we achieve good agreements between the theoretical predictions and the experimental data.
Hierarchical Mixtures of Experts Methodology Applied to Continuous Speech Recognition
Zhao, Ying, Schwartz, Richard M., Sroka, Jason J., Makhoul, John
In this paper, we incorporate the Hierarchical Mixtures of Experts (HME) method of probability estimation, developed by Jordan [1], into an HMMbased continuous speech recognition system. The resulting system can be thought of as a continuous-density HMM system, but instead of using gaussian mixtures, the HME system employs a large set of hierarchically organized but relatively small neural networks to perform the probability density estimation. The hierarchical structure is reminiscent of a decision tree except for two important differences: each "expert" or neural net performs a "soft" decision rather than a hard decision, and, unlike ordinary decision trees, the parameters of all the neural nets in the HME are automatically trainable using the EM algorithm. We report results on the ARPA 5,OOO-word and 4O,OOO-word Wall Street Journal corpus using HME models. 1 Introduction Recent research has shown that a continuous-density HMM (CD-HMM) system can outperform a more constrained tied-mixture HMM system for large-vocabulary continuous speech recognition (CSR) when a large amount of training data is available [2]. In other work, the utility of decision trees has been demonstrated in classification problems by using the "divide and conquer" paradigm effectively, where a problem is divided into a hierarchical set of simpler problems.
Non-linear Prediction of Acoustic Vectors Using Hierarchical Mixtures of Experts
Waterhouse, Steve R., Robinson, Anthony J.
We are concerned in this paper with the application of multiple models, specifically the Hierarchical Mixtures of Experts, to time series prediction, specifically the problem of predicting acoustic vectors for use in speech coding. There have been a number of applications of multiple models in time series prediction. A classic example is the Threshold Autoregressive model (TAR) which was used by Tong & 836 S. R. Waterhouse, A. J. Robinson Lim (1980) to predict sunspot activity. More recently, Lewis, Kay and Stevens (in Weigend & Gershenfeld (1994)) describe the use of Multivariate and Regression Splines (MARS) to the prediction of future values of currency exchange rates. Finally, in speech prediction, Cuperman & Gersho (1985) describe the Switched Inter-frame Vector Prediction (SIVP) method which switches between separate linear predictors trained on different statistical classes of speech.
Implementation of Neural Hardware with the Neural VLSI of URAN in Applications with Reduced Representations
Han, Il Song, Kim, Ki-Chul, Lee, Hwang-Soo
This paper describes a way of neural hardware implementation with the analog-digital mixed mode neural chip. The full custom neural VLSI of Universally Reconstructible Artificial Neural network (URAN) is used to implement Korean speech recognition system. A multi-layer perceptron with linear neurons is trained successfully under the limited accuracy in computations. The network with a large frame input layer is tested to recognize spoken korean words at a forward retrieval. Multichip hardware module is suggested with eight chips or more for the extended performance and capacity.
Learning with Preknowledge: Clustering with Point and Graph Matching Distance Measures
Gold, Steven, Rangarajan, Anand, Mjolsness, Eric
Recently, the importance of such preknowledge for learning has been convincingly argued from a statistical framework [Geman et al., 1992]. Researchers have proposed that our brains may incorporate preknowledge in the form of distance measures [Shepard, 1989]. The neural network community has begun to explore this idea via tangent distance [Simard et al., 1993], model learning [Williams et al., 1993] and point matching distances [Gold et al., 1994]. However, only the point matching distances have been invariant under permutations. Here we extend that work by enhancing both the scope and function of those distance measures, significantly expanding the problem domains where learning may take place. We learn objects consisting of noisy 2-D point-sets or noisy weighted graphs by clustering with point matching and graph matching distance measures. The point matching measure is approx.
Active Learning with Statistical Models
Cohn, David A., Ghahramani, Zoubin, Jordan, Michael I.
For many types of learners one can compute the statistically "optimal" way to select data. We review how these techniques have been used with feedforward neural networks [MacKay, 1992; Cohn, 1994]. We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate.