Perceptrons
Complexity of Finite Precision Neural Network Classifier
A rigorous analysis on the finite precision computational)Spects of neural network as a pattern classifier via a probabilistic approach is presented. Even though there exist negative results on the capa(cid:173) bility of perceptron, we show the following positive results: Given n pattern vectors each represented by en bits where e 1, that are uniformly distributed, with high probability the perceptron can perform all possible binary classifications of the patterns. More(cid:173) over, the resulting neural network requires a vanishingly small pro(cid:173) portion O(log n/n) of the memory that would be required for com(cid:173) plete storage of the patterns. Further, the perceptron algorithm takes O(n2) arithmetic operations with high probability, whereas other methods such as linear programming takes O(n3 .5) in the worst case. We also indicate some mathematical connections with VLSI circuit testing and the theory of random matrices.
Speech Recognition Using Demi-Syllable Neural Prediction Model
The Neural Prediction Model is the speech recognition model based on pattern prediction by multilayer perceptrons. Its effectiveness was con(cid:173) firmed by the speaker-independent digit recognition experiments. This paper presents an improvement in the model and its application to large vocabulary speech recognition, based on subword units. The improvement involves an introduction of "backward prediction," which further improves the prediction accuracy of the original model with only "forward predic(cid:173) tion". In application of the model to speaker-dependent large vocabulary speech recognition, the demi-syllable unit is used as a subword recognition unit. Experimental results indicated a 95.2% recognition accuracy for a 5000 word test set and the effectiveness was confirmed for the proposed model improvement and the demi-syllable subword units.
Connectionist Approaches to the Use of Markov Models for Speech Recognition
Previous work has shown the ability of Multilayer Perceptrons (MLPs) to estimate emission probabilities for Hidden Markov Mod(cid:173) els (HMMs). The advantages of a speech recognition system incor(cid:173) porating both MLPs and HMMs are the best discrimination and the ability to incorporate multiple sources of evidence (features, temporal context) without restrictive assumptions of distributions or statistical independence. This paper presents results on the speaker-dependent portion of DARPA's English language Resource Management database. Results support the previously reported utility of MLP probability estimation for continuous speech recog(cid:173) nition. An additional approach we are pursuing is to use MLPs as nonlinear predictors for autoregressive HMMs.
Multi-Layer Perceptrons with B-Spline Receptive Field Functions
Multi-layer perceptrons are often slow to learn nonlinear functions with complex local structure due to the global nature of their function approximations. It is shown that standard multi-layer perceptrons are actually a special case of a more general network formulation that incorporates B-splines into the node computations. This allows novel spline network architectures to be developed that can combine the generalization capabilities and scaling properties of global multi-layer feedforward networks with the computational efficiency and learning speed of local computational paradigms. Simulation results are presented for the well known spiral problem of Weiland and of Lang and Witbrock to show the effectiveness of the Spline Net approach.
Phonetic Classification and Recognition Using the Multi-Layer Perceptron
In this paper, we will describe several extensions to our earlier work, utiliz(cid:173) ing a segment-based approach. We will formulate our segmental framework and report our study on the use of multi-layer perceptrons for detection and classification of phonemes. We will also examine the outputs of the network, and compare the network performance with other classifiers. Our investigation is performed within a set of experiments that attempts to recognize 38 vowels and consonants in American English independent of speaker. When evaluated on the TIMIT database, our system achieves an accuracy of 56%.
Comparison of three classification techniques: CART, C4.5 and Multi-Layer Perceptrons
In this paper, after some introductory remarks into the classification prob(cid:173) lem as considered in various research communities, and some discussions concerning some of the reasons for ascertaining the performances of the three chosen algorithms, viz., CART (Classification and Regression Tree), C4.5 (one of the more recent versions of a popular induction tree tech(cid:173) nique known as ID3), and a multi-layer perceptron (MLP), it is proposed to compare the performances of these algorithms under two criteria: classi(cid:173) fication and generalisation. It is found that, in general, the MLP has better classification and generalisation accuracies compared with the other two algorithms.
Dynamics of Generalization in Linear Perceptrons
We study the evolution of the generalization ability of a simple linear per(cid:173) ceptron with N inputs which learns to imitate a "teacher perceptron". The system is trained on p aN binary example inputs and the generaliza(cid:173) tion ability measured by testing for agreement with the teacher on all 2N possible binary input patterns. The dynamics may be solved analytically and exhibits a phase transition from imperfect to perfect generalization at a 1. Except at this point the generalization ability approaches its asymptotic value exponentially, with critical slowing down near the tran(cid:173) sition; the relaxation time is ex (1 - y'a)-2. Right at the critical point, 1 the approach to perfect generalization follows a power law ex t - '2.
The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems
We present an analysis of how the generalization performance (expected test set error) relates to the expected training set error for nonlinear learn(cid:173) ing systems, such as multilayer perceptrons and radial basis functions. The expectations () of training set and test set errors are taken over possible training sets e and training and test sets e' respec(cid:173) tively. The effective number of parameters Peff(,x) usually differs from the true number of model parameters P for nonlinear or regularized models; this theoretical conclusion is supported by Monte Carlo experiments. In addition to the surprising result that Peff(,x);/; p, we propose an estimate of (1) called the generalized prediction error (GPE) which generalizes well established estimates of prediction risk such as Akaike's F P E and AI C, Mallows Cp, and Barron's PSE to the nonlinear setting.!
Neural Network Diagnosis of Avascular Necrosis from Magnetic Resonance Images
A vascular necrosis (AVN) of the femoral head is a common yet poten(cid:173) tially serious disorder which can be detected in its very early stages with magnetic resonance imaging. We have developed multi-layer perceptron networks, trained with conjugate gradient optimization, which diagnose A VN from single magnetic resonance images of the femoral head with 100% accuracy on training data and 97% accuracy on test data.
A Network of Localized Linear Discriminants
The localized linear discriminant network (LLDN) has been designed to address classification problems containing relatively closely spaced data from different classes (encounter zones [1], the accuracy problem [2]). Locally trained hyper(cid:173) plane segments are an effective way to define the decision boundaries for these regions [3]. The LLD uses a modified perceptron training algorithm for effective discovery of separating hyperplane/sigmoid units within narrow boundaries. The basic unit of the network is the discriminant receptive field (DRF) which combines the LLD function with Gaussians representing the dispersion of the local training data with respect to the hyperplane. The DRF implements a local distance mea(cid:173) sure [4], and obtains the benefits of networks oflocalized units [5].