Country
Self-Organizing Rules for Robust Principal Component Analysis
Principal Component Analysis (PCA) is an essential technique for data compression and feature extraction, and has been widely used in statistical data analysis, communication theory, pattern recognition and image processing. In the neural network literature, a lot of studies have been made on learning rules for implementing PCA or on networks closely related to PCA (see Xu & Yuille, 1993 for a detailed reference list which contains more than 30 papers related to these issues).
Using Prior Knowledge in a NNPDA to Learn Context-Free Languages
Das, Sreerupa, Giles, C. Lee, Sun, Guo-Zheng
Language inference and automata induction using recurrent neural networks has gained considerable interest in the recent years. Nevertheless, success of these models has been mostly limited to regular languages. Additional information in form of a priori knowledge has proved important and at times necessary for learning complex languages (Abu-Mostafa 1990; AI-Mashouq and Reed, 1991; Omlin and Giles, 1992; Towell, 1990). They have demonstrated that partial information incorporated in a connectionist model guides the learning process through constraints for efficient learning and better generalization. 'Ve have previously shown that the NNPDA model can learn Deterministic Context 65 66 Das, Giles, and Sun
Learning Curves, Model Selection and Complexity of Neural Networks
Murata, Noboru, Yoshizawa, Shuji, Amari, Shun-ichi
Learning curves show how a neural network is improved as the number of t.raiuing examples increases and how it is related to the network complexity. The present paper clarifies asymptotic properties and their relation of t.wo learning curves, one concerning the predictive loss or generalization loss and the other the training loss. The result gives a natural definition of the complexity of a neural network. Moreover, it provides a new criterion of model selection.
Summed Weight Neuron Perturbation: An O(N) Improvement Over Weight Perturbation
The algorithm presented performs gradient descent on the weight space of an Artificial Neural Network (ANN), using a finite difference to approximate the gradient The method is novel in that it achieves a computational complexity similar to that of Node Perturbation, O(N3), but does not require access to the activity of hidden or internal neurons. This is possible due to a stochastic relation between perturbations at the weights and the neurons of an ANN. The algorithm is also similar to Weight Perturbation in that it is optimal in terms of hardware requirements when used for the training ofVLSI implementations of ANN's.
Parameterising Feature Sensitive Cell Formation in Linsker Networks in the Auditory System
Walton, Lance C., Bisset, David L.
This paper examines and extends the work of Linsker (1986) on self organising feature detectors. Linsker concentrates on the visual processing system, but infers that the weak assumptions made will allow the model to be used in the processing of other sensory information. This claim is examined here, with special attention paid to the auditory system, where there is much lower connectivity and therefore more statistical variability. Online training is utilised, to obtain an idea of training times. These are then compared to the time available to prenatal mammals for the formation of feature sensitive cells. 1 INTRODUCTION Within the last thirty years, a great deal of research has been carried out in an attempt to understand the development of cells in the pathways between the sensory apparatus and the cortex in mammals. For example, theories for the development of feature detectors were forwarded by Nass and Cooper (1975), by Grossberg (1976) and more recently Obermayer et al (1990). Hubel and Wiesel (1961) established the existence of several different types of feature sensitive cell in the visual cortex of cats. Various subsequent experiments have 1007 1008 Walton and Bisset shown that a considerable amount of development takes place before birth (i.e.
Extended Regularization Methods for Nonconvergent Model Selection
Finnoff, W., Hergert, F., Zimmermann, H. G.
Many techniques for model selection in the field of neural networks correspond to well established statistical methods. The method of'stopped training', on the other hand, in which an oversized network is trained until the error on a further validation set of examples deteriorates, then training is stopped, is a true innovation, since model selection doesn't require convergence of the training process. In this paper we show that this performance can be significantly enhanced by extending the'non convergent model selection method' of stopped training to include dynamic topology modifications (dynamic weight pruning) and modified complexity penalty term methods in which the weighting of the penalty term is adjusted during the training process. 1 INTRODUCTION One of the central topics in the field of neural networks is that of model selection. Both the theoretical and practical side of this have been intensively investigated and a vast array of methods have been suggested to perform this task. A widely used class of techniques starts by choosing an'oversized' network architecture then either removing redundant elements based on some measure of saliency (pruning), adding a further term to the cost function penalizing complexity (penalty terms), and finally, observing the error on a further validation set of examples, then stopping training as soon as this performance begins to deteriorate (stopped training).
Non-Linear Dimensionality Reduction
DeMers, David, Cottrell, Garrison W.
A method for creating a nonlinear encoder-decoder for multidimensional data with compact representations is presented. The commonly used technique of autoassociation is extended to allow nonlinear representations, and an objective function which penalizes activations of individual hidden units is shown to result in minimum dimensional encodings with respect to allowable error in reconstruction. 1 INTRODUCTION Reducing dimensionality of data with minimal information loss is important for feature extraction, compact coding and computational efficiency. The data can be tranformed into "good" representations for further processing, constraints among feature variables may be identified, and redundancy eliminated. Many algorithms are exponential in the dimensionality of the input, thus even reduction by a single dimension may provide valuable computational savings. Autoassociating feed forward networks with one hidden layer have been shown to extract the principal components of the data (Baldi & Hornik, 1988). Such networks have been used to extract features and develop compact encodings of the data (Cottrell, Munro & Zipser, 1989). Principal Components Analysis projects the data into a linear subspace -email: demers@cs.ucsd.edu
Spiral Waves in Integrate-and-Fire Neural Networks
Milton, John G., Chu, Po Hsiang, Cowan, Jack D.
The formation of propagating spiral waves is studied in a randomly connected neural network composed of integrate-and-fire neurons with recovery period and excitatory connections using computer simulations. Network activity is initiated by periodic stimulation at a single point. The results suggest that spiral waves can arise in such a network via a sub-critical Hopf bifurcation. 1 Introduction
Word Space
Representations for semantic information about words are necessary for many applications of neural networks in natural language processing. This paper describes an efficient, corpus-based method for inducing distributed semantic representations for a large number of words (50,000) from lexical coccurrence statistics by means of a large-scale linear regression. The representations are successfully applied to word sense disambiguation using a nearest neighbor method. 1 Introduction Many tasks in natural language processing require access to semantic information about lexical items and text segments.