Goto

Collaborating Authors

 Perceptrons


A Realizable Learning Task which Exhibits Overfitting

Neural Information Processing Systems

In this paper we examine a perceptron learning task. The task is realizable since it is provided by another perceptron with identical architecture. Both perceptrons have nonlinear sigmoid output functions. The gain of the output function determines the level of nonlinearity of the learning task. It is observed that a high level of nonlinearity leads to overfitting. We give an explanation for this rather surprising observation and develop a method to avoid the overfitting. This method has two possible interpretations, one is learning with noise, the other cross-validated early stopping.


On the Computational Power of Noisy Spiking Neurons

Neural Information Processing Systems

It has remained unknown whether one can in principle carry out reliable digital computations with networks of biologically realistic models for neurons. This article presents rigorous constructions for simulating in real-time arbitrary given boolean circuits and finite automata with arbitrarily high reliability by networks of noisy spiking neurons. In addition we show that with the help of "shunting inhibition" even networks of very unreliable spiking neurons can simulate in real-time any McCulloch-Pitts neuron (or "threshold gate"), and therefore any multilayer perceptron (or "threshold circuit") in a reliable manner. These constructions provide a possible explanation for the fact that biological neural systems can carry out quite complex computations within 100 msec. It turns out that the assumption that these constructions require about the shape of the EPSP's and the behaviour of the noise are surprisingly weak. 1 Introduction


Sample Complexity for Learning Recurrent Perceptron Mappings

Neural Information Processing Systems

Recurrent perceptron classifiers generalize the classical perceptron model. They take into account those correlations and dependences among input coordinates which arise from linear digital filtering. This paper provides tight bounds on sample complexity associated to the fitting of such models to experimental data. 1 Introduction One of the most popular approaches to binary pattern classification, underlying many statistical techniques, is based on perceptrons or linear discriminants; see for instance the classical reference (Duda and Hart, 1973).


Learning Sparse Perceptrons

Neural Information Processing Systems

We introduce a new algorithm designed to learn sparse perceptrons overinput representations which include high-order features. Our algorithm, which is based on a hypothesis-boosting method, is able to PAClearn a relatively natural class of target concepts. Moreover, the algorithm appears to work well in practice: on a set of three problem domains, the algorithm produces classifiers that utilize small numbers of features yet exhibit good generalization performance. Perhaps most importantly, our algorithm generates concept descriptions that are easy for humans to understand. 1 Introduction Multi-layer perceptron (MLP) learning is a powerful method for tasks such as concept classification.However, in many applications, such as those that may involve scientific discovery, it is crucial to be able to explain predictions. Multi-layer perceptrons arelimited in this regard, since their representations are notoriously difficult for humans to understand.


Sample Complexity for Learning Recurrent Perceptron Mappings

Neural Information Processing Systems

Recurrent perceptron classifiers generalize the classical perceptron model. They take into account those correlations and dependences among input coordinates which arise from linear digital filtering. This paper provides tight bounds on sample complexity associated to the fitting of such models to experimental data. 1 Introduction One of the most popular approaches to binary pattern classification, underlying many statistical techniques, is based on perceptrons or linear discriminants; see for instance the classical reference (Duda and Hart, 1973).


Examples of learning curves from a modified VC-formalism

Neural Information Processing Systems

We examine the issue of evaluation of model specific parameters in a modified VC-formalism. Two examples are analyzed: the 2-dimensional homogeneous perceptron and the I-dimensional higher order neuron. Both models are solved theoretically, and their learning curves are compared againsttrue learning curves. It is shown that the formalism has the potential to generate a variety of learning curves, including ones displaying ''phase transitions."


The Gamma MLP for Speech Phoneme Recognition

Neural Information Processing Systems

Department of Electrical and Computer Engineering University of Queensland St. Lucia Qld 4072 Australia Abstract We define a Gamma multi-layer perceptron (MLP) as an MLP with the usual synaptic weights replaced by gamma filters (as proposed byde Vries and Principe (de Vries and Principe, 1992)) and associated gain terms throughout all layers. We derive gradient descent update equations and apply the model to the recognition of speech phonemes. We find that both the inclusion of gamma filters in all layers, and the inclusion of synaptic gains, improves the performance of the Gamma MLP. We compare the Gamma MLP with TDNN, Back-Tsoi FIR MLP, and Back-Tsoi I1R MLP architectures, and a local approximation scheme. We find that the Gamma MLP results in an substantial reduction in error rates. 1 INTRODUCTION 1.1 THE GAMMA FILTER Infinite Impulse Response (I1R) filters have a significant advantage over Finite Impulse Response(FIR) filters in signal processing: the length of the impulse response is uncoupled from the number of filter parameters.


Laterally Interconnected Self-Organizing Maps in Hand-Written Digit Recognition

Neural Information Processing Systems

An application of laterally interconnected self-organizing maps (LISSOM) to handwritten digit recognition is presented. The resulting excitatory connections focus the activity into local patches and the inhibitory connections decorrelate redundant activityon the map. The map thus forms internal representations thatare easy to recognize with e.g. a perceptron network. The recognition rate on a subset of NIST database 3 is 4.0% higher with LISSOM than with a regular Self-Organizing Map (SOM) as the front end, and 15.8% higher than recognition of raw input bitmaps directly. These results form a promising starting point for building pattern recognition systems with a LISSOM map as a front end. 1 Introduction Handwritten digit recognition has become one of the touchstone problems in neural networks recently.


The Capacity of a Bump

Neural Information Processing Systems

Recently, several researchers have reported encouraging experimental results whenusing Gaussian or bump-like activation functions in multilayer perceptrons. Networks of this type usually require fewer hidden layers and units and often learn much faster than typical sigmoidal networks. To explain these results we consider a hyper-ridge network, which is a simple perceptron with no hidden units and a rid¥e activation function. If we are interested in partitioningp points in d dimensions into two classes then in the limit as d approaches infinity the capacity of a hyper-ridge and a perceptron is identical.