Perceptrons
Self-Organizing and Adaptive Algorithms for Generalized Eigen-Decomposition
Chatterjee, Chanchal, Roychowdhury, Vwani P.
The paper is developed in two parts where we discuss a new approach to self-organization in a single-layer linear feed-forward network. First, two novel algorithms for self-organization are derived from a two-layer linear hetero-associative network performing a one-of-m classification, and trained with the constrained least-mean-squared classification error criterion. Second, two adaptive algorithms are derived from these selforganizing proceduresto compute the principal generalized eigenvectors of two correlation matrices from two sequences of random vectors. These novel adaptive algorithms can be implemented in a single-layer linear feed-forward network. We give a rigorous convergence analysis of the adaptive algorithms by using stochastic approximation theory. As an example, we consider a problem of online signal detection in digital mobile communications.
A Constructive Learning Algorithm for Discriminant Tangent Models
Sona, Diego, Sperduti, Alessandro, Starita, Antonina
To reduce the computational complexity of classification systems using tangent distance, Hastie et al. (HSS) developed an algorithm todevise rich models for representing large subsets of the data which computes automatically the "best" associated tangent subspace.Schwenk & Milgram proposed a discriminant modular classification system (Diabolo) based on several autoassociative multilayer perceptrons which use tangent distance as error reconstruction measure. We propose a gradient based constructive learning algorithm for building a tangent subspace model with discriminant capabilities which combines several of the the advantages of both HSS and Diabolo: devised tangent models hold discriminant capabilities, space requirements are improved with respect to HSS since our algorithm is discriminant and thus it needs fewer prototype models, dimension of the tangent subspace is determined automatically by the constructive algorithm, and our algorithm is able to learn new transformations.
Microscopic Equations in Rough Energy Landscape for Neural Networks
We consider the microscopic equations for learning problems in neural networks. The aligning fields of an example are obtained from the cavity fields, which are the fields if that example were absent in the learning process. In a rough energy landscape, we assume that the density of the local minima obey an exponential distribution, yielding macroscopic properties agreeing with the first step replica symmetry breaking solution. Iterating the microscopic equations provide a learning algorithm, which results in a higher stability than conventional algorithms. 1 INTRODUCTION Most neural networks learn iteratively by gradient descent. As a result, closed expressions forthe final network state after learning are rarely known. This precludes further analysis of their properties, and insights into the design of learning algorithms.
Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient
Shun-ichi Amari RIKEN Frontier Research Program, RIKEN, Hirosawa 2-1, Wako-shi 351-01, Japan amari@zoo.riken.go.jp Abstract The parameter space of neural networks has a Riemannian metric structure.The natural Riemannian gradient should be used instead of the conventional gradient, since the former denotes the true steepest descent direction of a loss function in the Riemannian space. The behavior of the stochastic gradient learning algorithm is much more effective if the natural gradient is used. The present paper studies the information-geometrical structure of perceptrons and other networks, and prove that the online learning method based on the natural gradient is asymptotically as efficient as the optimal batch algorithm. Adaptive modification of the learning constant is proposed and analyzed in terms of the Riemannian measure andis shown to be efficient. The natural gradient is finally applied to blind separation of mixtured independent signal sources. 1 Introd uction Neural learning takes place in the parameter space of modifiable synaptic weights of a neural network.
Finite size scaling of the bayesian perceptron
Buhot, A., Moreno, J. -M. Torres, Gordon, M. B.
We study numerically the properties of the bayesian perceptron through a gradient descent on the optimal cost function. The theoretical distribution of stabilities is deduced. It predicts that the optimal generalizer lies close to the boundary of the space of (error-free) solutions. The numerical simulations are in good agreement with the theoretical distribution. The extrapolation of the generalization error to infinite input space size agrees with the theoretical results. Finite size corrections are negative and exhibit two different scaling regimes, depending on the training set size. The variance of the generalization error vanishes for $N \rightarrow \infty$ confirming the property of self-averaging.
A Realizable Learning Task which Exhibits Overfitting
In this paper we examine a perceptron learning task. The task is realizable since it is provided by another perceptron with identical architecture. Both perceptrons have nonlinear sigmoid output functions. The gain of the output function determines the level of nonlinearity of the learning task. It is observed that a high level of nonlinearity leads to overfitting. We give an explanation for this rather surprising observation and develop a method to avoid the overfitting. This method has two possible interpretations, one is learning with noise, the other cross-validated early stopping.
The Gamma MLP for Speech Phoneme Recognition
Lawrence, Steve, Tsoi, Ah Chung, Back, Andrew D.
We define a Gamma multi-layer perceptron (MLP) as an MLP with the usual synaptic weights replaced by gamma filters (as proposed by de Vries and Principe (de Vries and Principe, 1992)) and associated gain terms throughout all layers. We derive gradient descent update equations and apply the model to the recognition of speech phonemes. We find that both the inclusion of gamma filters in all layers, and the inclusion of synaptic gains, improves the performance of the Gamma MLP. We compare the Gamma MLP with TDNN, Back-Tsoi FIR MLP, and Back-Tsoi I1R MLP architectures, and a local approximation scheme. We find that the Gamma MLP results in an substantial reduction in error rates.
Laterally Interconnected Self-Organizing Maps in Hand-Written Digit Recognition
Choe, Yoonsuck, Sirosh, Joseph, Miikkulainen, Risto
The lateral connections learn the correlations of activity between units on the map. The resulting excitatory connections focus the activity into local patches and the inhibitory connections decorrelate redundant activity on the map. The map thus forms internal representations that are easy to recognize with e.g. a perceptron network. The recognition rate on a subset of NIST database 3 is 4.0% higher with LISSOM than with a regular Self-Organizing Map (SOM) as the front end, and 15.8% higher than recognition of raw input bitmaps directly. These results form a promising starting point for building pattern recognition systems with a LISSOM map as a front end. 1 Introduction Handwritten digit recognition has become one of the touchstone problems in neural networks recently.
Learning Sparse Perceptrons
Jackson, Jeffrey C., Craven, Mark
We introduce a new algorithm designed to learn sparse perceptrons over input representations which include high-order features. Our algorithm, which is based on a hypothesis-boosting method, is able to PAClearn a relatively natural class of target concepts. Moreover, the algorithm appears to work well in practice: on a set of three problem domains, the algorithm produces classifiers that utilize small numbers of features yet exhibit good generalization performance. Perhaps most importantly, our algorithm generates concept descriptions that are easy for humans to understand. However, in many applications, such as those that may involve scientific discovery, it is crucial to be able to explain predictions.