Plotting

Extended Regularization Methods for Nonconvergent Model Selection

Neural Information Processing Systems

Many techniques for model selection in the field of neural networks correspond to well established statistical methods. The method of'stopped training', on the other hand, in which an oversized network is trained until the error on a further validation set of examples deteriorates, then training is stopped, is a true innovation, since model selection doesn't require convergence of the training process. In this paper we show that this performance can be significantly enhanced by extending the'non convergent model selection method' of stopped training to include dynamic topology modifications (dynamic weight pruning) and modified complexity penalty term methods in which the weighting of the penalty term is adjusted during the training process. 1 INTRODUCTION One of the central topics in the field of neural networks is that of model selection. Both the theoretical and practical side of this have been intensively investigated and a vast array of methods have been suggested to perform this task. A widely used class of techniques starts by choosing an'oversized' network architecture then either removing redundant elements based on some measure of saliency (pruning), adding a further term to the cost function penalizing complexity (penalty terms), and finally, observing the error on a further validation set of examples, then stopping training as soon as this performance begins to deteriorate (stopped training).


Non-Linear Dimensionality Reduction

Neural Information Processing Systems

A method for creating a nonlinear encoder-decoder for multidimensional data with compact representations is presented. The commonly used technique of autoassociation is extended to allow nonlinear representations, and an objective function which penalizes activations of individual hidden units is shown to result in minimum dimensional encodings with respect to allowable error in reconstruction. 1 INTRODUCTION Reducing dimensionality of data with minimal information loss is important for feature extraction, compact coding and computational efficiency. The data can be tranformed into "good" representations for further processing, constraints among feature variables may be identified, and redundancy eliminated. Many algorithms are exponential in the dimensionality of the input, thus even reduction by a single dimension may provide valuable computational savings. Autoassociating feed forward networks with one hidden layer have been shown to extract the principal components of the data (Baldi & Hornik, 1988). Such networks have been used to extract features and develop compact encodings of the data (Cottrell, Munro & Zipser, 1989). Principal Components Analysis projects the data into a linear subspace -email: demers@cs.ucsd.edu


Spiral Waves in Integrate-and-Fire Neural Networks

Neural Information Processing Systems

The formation of propagating spiral waves is studied in a randomly connected neural network composed of integrate-and-fire neurons with recovery period and excitatory connections using computer simulations. Network activity is initiated by periodic stimulation at a single point. The results suggest that spiral waves can arise in such a network via a sub-critical Hopf bifurcation. 1 Introduction


Word Space

Neural Information Processing Systems

Representations for semantic information about words are necessary for many applications of neural networks in natural language processing. This paper describes an efficient, corpus-based method for inducing distributed semantic representations for a large number of words (50,000) from lexical coccurrence statistics by means of a large-scale linear regression. The representations are successfully applied to word sense disambiguation using a nearest neighbor method. 1 Introduction Many tasks in natural language processing require access to semantic information about lexical items and text segments.


Object-Based Analog VLSI Vision Circuits

Neural Information Processing Systems

We describe two successfully working, analog VLSI vision circuits that move beyond pixel-based early vision algorithms. One circuit, implementing the dynamic wires model, provides for dedicated lines of communication among groups of pixels that share a common property. The chip uses the dynamic wires model to compute the arclength of visual contours. Another circuit labels all points inside a given contour with one voltage and all other with another voltage. Its behavior is very robust, since small breaks in contours are automatically sealed, providing for Figure-Ground segregation in a noisy environment. Both chips are implemented using networks of resistors and switches and represent a step towards object level processing since a single voltage value encodes the property of an ensemble of pixels.


Kohonen Feature Maps and Growing Cell Structures - a Performance Comparison

Neural Information Processing Systems

A performance comparison of two self-organizing networks, the Kohonen Feature Map and the recently proposed Growing Cell Structures is made. For this purpose several performance criteria for self-organizing networks are proposed and motivated. The models are tested with three example problems of increasing difficulty. The Kohonen Feature Map demonstrates slightly superior results only for the simplest problem.



A Recurrent Neural Network for Generation of Occular Saccades

Neural Information Processing Systems

Electrophysiological studies (Cynader and Berman 1972, Robinson 1972) showed that the intermediate layer of SC is topographically organized into a motor map. The location of active neurons in this area was found to be related to the oculomotor error (Le.


Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors

Neural Information Processing Systems

We propose a very simple, and well principled way of computing the optimal step size in gradient descent algorithms. The online version is very efficient computationally, and is applicable to large backpropagation networks trained on large data sets. The main ingredient is a technique for estimating the principal eigenvalue(s) and eigenvector(s) of the objective function's second derivative matrix (Hessian), which does not require to even calculate the Hessian. Several other applications of this technique are proposed for speeding up learning, or for eliminating useless parameters. 1 INTRODUCTION Choosing the appropriate learning rate, or step size, in a gradient descent procedure such as backpropagation, is simultaneously one of the most crucial and expertintensive part of neural-network learning. We propose a method for computing the best step size which is both well-principled, simple, very cheap computationally, and, most of all, applicable to online training with large networks and data sets.


Learning to categorize objects using temporal coherence

Neural Information Processing Systems

The invariance of an objects' identity as it transformed over time provides a powerful cue for perceptual learning. We present an unsupervised learning procedure which maximizes the mutual information between the representations adopted by a feed-forward network at consecutive time steps. We demonstrate that the network can learn, entirely unsupervised, to classify an ensemble of several patterns by observing pattern trajectories, even though there are abrupt transitions from one object to another between trajectories. The same learning procedure should be widely applicable to a variety of perceptual learning tasks. 1 INTRODUCTION A promising approach to understanding human perception is to try to model its developmental stages. There is ample evidence that much of perception is learned.