Recognizing Hand-Printed Letters and Digits
Martin, Gale, Pittman, James A.
Gale L. Martin James A. Pittman MCC, Austin, Texas 78759 ABSTRACT We are developing a hand-printed character recognition system using a multilayered neuralnet trained through backpropagation. We report on results of training nets with samples of hand-printed digits scanned off of bank checks and hand-printed letters interactively entered into a computer through a stylus digitizer.Given a large training set, and a net with sufficient capacity to achieve high performance on the training set, nets typically achieved error rates of 4-5% at a 0% reject rate and 1-2% at a 10% reject rate. The topology and capacity of the system, as measured by the number of connections in the net, have surprisingly little effect on generalization. For those developing practical pattern recognition systems, these results suggest that a large and representative training sample may be the single, most important factor in achieving high recognition accuracy. Reducing capacity does have other benefits however, especially when the reduction isaccomplished by using local receptive fields with shared weights. In this latter case, we find the net evolves feature detectors resembling those in visual cortex and Linsker's orientation-selective nodes.
The Cocktail Party Problem: Speech/Data Signal Separation Comparison between Backpropagation and SONN
Kassebaum, John, Tenorio, Manoel Fernando, Schaefers, Christoph
This work introduces a new method called Self Organizing Neural Network (SONN) algorithm and compares its performance with Back Propagation in a signal separation application. The problem is to separate two signals; a modem data signal and a male speech signal, added and transmitted through a 4 khz channel. The signals are sampled at 8 khz, and using supervised learning, an attempt is made to reconstruct them. The SONN is an algorithm that constructs its own network topology during training, which is shown to be much smaller than the BP network, faster to trained, and free from the trial-anderror network design that characterize BP. 1. INTRODUCTION The research in Neural Networks has witnessed major changes in algorithm design focus, motivated by the limitations perceived in the algorithms available at the time.
Synergy of Clustering Multiple Back Propagation Networks
Lincoln, William P., Skrzypek, Josef
The properties of a cluster of multiple back-propagation (BP) networks are examined and compared to the performance of a single BP network. Theunderlying idea is that a synergistic effect within the cluster improves the perfonnance and fault tolerance. Five networks were initially trainedto perfonn the same input-output mapping. Following training, a cluster was created by computing an average of the outputs generated by the individual networks. The output of the cluster can be used as the desired output during training by feeding it back to the individual networks.In comparison to a single BP network, a cluster of multiple BP's generalization and significant fault tolerance. It appear that cluster advantage follows from simple maxim "you can fool some of the single BP's in a cluster all of the time but you cannot fool all of them all of the time" {Lincoln} 1 INTRODUCTION Shortcomings of back-propagation (BP) in supervised learning has been well documented inthe past {Soulie, 1987; Bernasconi, 1987}. Often, a network of a finite size does not learn a particular mapping completely or it generalizes poorly.
Generalized Hopfield Networks and Nonlinear Optimization
Reklaitis, Gintaras V., Tsirukis, Athanasios G., Tenorio, Manoel Fernando
Purdue University Purdue University Purdue University W. Lafayette, IN. 47907 W. Lafayette, IN. 47907 W. Lafayette, IN. 47907 ABSTRACT A nonlinear neural framework, called the Generalized Hopfield network, is proposed, which is able to solve in a parallel distributed manner systems of nonlinear equations. The method is applied to the general nonlinear optimization problem. We demonstrate GHNs implementing the three most important optimization algorithms, namely the Augmented Lagrangian, Generalized Reduced Gradient and Successive Quadratic Programming methods. The study results in a dynamic view of the optimization problem and offers a straightforward model for the parallelization of the optimization computations, thus significantly extending the practical limits of problems that can be formulated as an optimization problem and which can gain from the introduction of nonlinearities in their structure (eg. The ability of networks of highly interconnected simple nonlinear analog processors (neurons) to solve complicated optimization problems was demonstrated in a series of papers by Hopfield and Tank (Hopfield, 1984), (Tank, 1986).
Coupled Markov Random Fields and Mean Field Theory
Geiger, Davi, Girosi, Federico
In recent years many researchers have investigated the use of Markov Random Fields (MRFs) for computer vision. They can be applied for example to reconstruct surfaces from sparse and noisy depth data coming from the output of a visual process, or to integrate early vision processes to label physical discontinuities. In this paper weshow that by applying mean field theory to those MRFs models a class of neural networks is obtained. Those networks can speed up the solution for the MRFs models. The method is not restricted to computer vision. 1 Introduction
The Truth, the Whole Truth, and Nothing But the Truth
Truth maintenance is a collection of techniques for doing belief revision. A truth maintenance system's task is to maintain a set of beliefs in such a way that they are not known to be contradictory and no belief is kept without a reason. Truth maintenance systems were introduced in the late seventies by Jon Doyle and in the last five years there has been an explosion of interest in this kind of systems. In this paper we present an annotated bibliography to the literature of truth maintenance systems, grouping the works referenced according to several classifications.
Meiosis Networks
A central problem in connectionist modelling is the control of network and architectural resources during learning. In the present approach, weights reflect a coarse prediction history as coded by a distribution of values and parameterized in the mean and standard deviation of these weight distributions. Weight updates are a function of both the mean and standard deviation of each connection in the network and vary as a function of the error signal ("stochastic delta rule"; Hanson, 1990). Consequently, the weights maintain information on their central tendency and their "uncertainty" in prediction. Such information is useful in establishing a policy concerning the size of the nodal complexity of the network and growth of new nodes. For example, during problem solving the present network can undergo "meiosis", producing two nodes where there was one "overtaxed" node as measured by its coefficient of variation. It is shown in a number of benchmark problems that meiosis networks can find minimal architectures, reduce computational complexity, and overall increase the efficiency of the representation learning interaction.
Discovering High Order Features with Mean Field Modules
Galland, Conrad C., Hinton, Geoffrey E.
A new form of the deterministic Boltzmann machine (DBM) learning procedure is presented which can efficiently train network modules to discriminate between input vectors according to some criterion. The new technique directly utilizes the free energy of these "mean field modules" to represent the probability that the criterion is met, the free energy being readily manipulated by the learning procedure. Although conventional deterministic Boltzmann learning fails to extract the higher order feature of shift at a network bottleneck, combining the new mean field modules with the mutual information objective function rapidly produces modules that perfectly extract this important higher order feature without direct external supervision. 1 INTRODUCTION The Boltzmann machine learning procedure (Hinton and Sejnowski, 1986) can be made much more efficient by using a mean field approximation in which stochastic binary units are replaced by deterministic real-valued units (Peterson and Anderson, 1987). Deterministic Boltzmann learning can be used for "multicompletion" tasks in which the subsets of the units that are treated as input or output are varied from trial to trial (Peterson and Hartman, 1988). In this respect it resembles other learning procedures that also involve settling to a stable state (Pineda, 1987). Using the multicompletion paradigm, it should be possible to force a network to explicitly extract important higher order features of an ensemble of training vectors by forcing the network to pass the information required for correct completions through a narrow bottleneck. In back-propagation networks with two or three hidden layers, the use of bottlenecks sometimes allows the learning to explictly discover important.
The CHIR Algorithm for Feed Forward Networks with Binary Weights
A new learning algorithm, Learning by Choice of Internal Represetations (CHIR), was recently introduced. Whereas many algorithms reduce the learning process to minimizing a cost function over the weights, our method treats the internal representations as the fundamental entities to be determined. The algorithm applies a search procedure in the space of internal representations, and a cooperative adaptation of the weights (e.g. by using the perceptron learning rule). Since the introduction of its basic, single output version, the CHIR algorithm was generalized to train any feed forward network of binary neurons. Here we present the generalised version of the CHIR algorithm, and further demonstrate its versatility by describing how it can be modified in order to train networks with binary ( 1) weights. Preliminary tests of this binary version on the random teacher problem are also reported.