Country
A Realizable Learning Task which Exhibits Overfitting
In this paper we examine a perceptron learning task. The task is realizable since it is provided by another perceptron with identical architecture. Both perceptrons have nonlinear sigmoid output functions. The gain of the output function determines the level of nonlinearity of the learning task. It is observed that a high level of nonlinearity leads to overfitting. We give an explanation for this rather surprising observation and develop a method to avoid the overfitting. This method has two possible interpretations, one is learning with noise, the other cross-validated early stopping.
On the Computational Power of Noisy Spiking Neurons
It has remained unknown whether one can in principle carry out reliable digital computations with networks of biologically realistic models for neurons. This article presents rigorous constructions for simulating in real-time arbitrary given boolean circuits and finite automata with arbitrarily high reliability by networks of noisy spiking neurons. In addition we show that with the help of "shunting inhibition" even networks of very unreliable spiking neurons can simulate in real-time any McCulloch-Pitts neuron (or "threshold gate"), and therefore any multilayer perceptron (or "threshold circuit") in a reliable manner. These constructions provide a possible explanation for the fact that biological neural systems can carry out quite complex computations within 100 msec. It turns out that the assumption that these constructions require about the shape of the EPSP's and the behaviour of the noise are surprisingly weak. 1 Introduction
Sample Complexity for Learning Recurrent Perceptron Mappings
DasGupta, Bhaskar, Sontag, Eduardo D.
Recurrent perceptron classifiers generalize the classical perceptron model. They take into account those correlations and dependences among input coordinates which arise from linear digital filtering. This paper provides tight bounds on sample complexity associated to the fitting of such models to experimental data. 1 Introduction One of the most popular approaches to binary pattern classification, underlying many statistical techniques, is based on perceptrons or linear discriminants; see for instance the classical reference (Duda and Hart, 1973).
Neural Networks with Quadratic VC Dimension
Koiran, Pascal, Sontag, Eduardo D.
A set of labeled training samples is provided, and a network must be obtained which is then expected to correctly classify previously unseen inputs. In this context, a central problem is to estimate the amount of training data needed to guarantee satisfactory learning performance. To study this question, it is necessary to first formalize the notion of learning from examples. One such formalization is based on the paradigm of probably approximately correct (PAC) learning, due to Valiant (1984). In this framework, one starts by fitting some function /, chosen from a predetermined class F, to the given training data. The class F is often called the "hypothesis class", and for purposes of this discussion it will be assumed that the functions in F take binary values {O, I} and are defined on a common domain X.
Learning with ensembles: How overfitting can be useful
We study the characteristics of learning with ensembles. Solving exactly the simple model of an ensemble of linear students, we find surprisingly rich behaviour. For learning in large ensembles, it is advantageous to use under-regularized students, which actually over-fit the training data. Globally optimal performance can be obtained by choosing the training set sizes of the students appropriately. For smaller ensembles, optimization of the ensemble weights can yield significant improvements in ensemble generalization performance, in particular if the individual students are subject to noise in the training process. Choosing students with a wide range of regularization parameters makes this improvement robust against changes in the unknown level of noise in the training data. 1 INTRODUCTION An ensemble is a collection of a (finite) number of neural networks or other types of predictors that are trained for the same task.
A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split
We work in a setting in which we must choose the right number of parameters for a hypothesis function in response to a finite training sample, with the goal of minimizing the resulting generalization error. There is a large and interesting literature on cross validation methods, which often emphasizes asymptotic statistical properties, or the exact calculation of the generalization error for simple models. Our approach here is somewhat different, and is pri mari I y inspired by two sources. The first is the work of Barron and Cover [2], who introduced the idea of bounding the error of a model selection method (in their case, the Minimum Description Length Principle) in terms of a quantity known as the index of resolvability. The second is the work of Vapnik [5], who provided extremely powerful and general tools for uniformly bounding the deviations between training and generalization errors. We combine these methods to give a new and general analysis of cross validation performance. In the first and more formal part of the paper, we give a rigorous bound on the error of cross validation in terms of two parameters of the underlying model selection problem: the approximation rate and the estimation rate. In the second and more experimental part of the paper, we investigate the implications of our bound for choosing'Y, the fraction of data withheld for testing in cross validation. The most interesting aspect of this analysis is the identification of several qualitative properties of the optimal'Y that appear to be invariant over a wide class of model selection problems: - When the target function complexity is small compared to the sample size, the performance of cross validation is relatively insensitive to the choice of'Y.
Statistical Theory of Overtraining - Is Cross-Validation Asymptotically Effective?
Amari, Shun-ichi, Murata, Noboru, Müller, Klaus-Robert, Finke, Michael, Yang, Howard Hua
A statistical theory for overtraining is proposed. The analysis treats realizable stochastic neural networks, trained with Kullback Leibler loss in the asymptotic case. It is shown that the asymptotic gain in the generalization error is small if we perform early stopping, even if we have access to the optimal stopping time. Considering cross-validation stopping we answer the question: In what ratio the examples should be divided into training and testing sets in order to obtain the optimum performance. In the non-asymptotic region cross-validated early stopping always decreases the generalization error. Our large scale simulations done on a CM5 are in nice agreement with our analytical findings.
Plasticity of Center-Surround Opponent Receptive Fields in Real and Artificial Neural Systems of Vision
Yasui, S., Furukawa, T., Yamada, M., Saito, T.
The center-surround opponent receptive field (CSRF) mechanism represents one such example. Here, analogous CSRFs are shown to be formed in an artificial neural network which learns to localize contours (edges) of the luminance difference. Furthermore, when the input pattern is corrupted by a background noise, the CSRFs of the hidden units becomes shallower and broader with decrease of the signal-to-noise ratio (SNR). The same kind of SNR-dependent plasticity is present in the CSRF of real visual neurons; in bipolar cells of the carp retina as is shown here experimentally, as well as in large monopolar cells of the fly compound eye as was described by others. Also, analogous SNRdependent plasticity is shown to be present in the biphasic flash responses (BPFR) of these artificial and biological visual systems. Thus, the spatial (CSRF) and temporal (BPFR) filtering properties with which a wide variety of creatures see the world appear to be optimized for detectability of changes in space and time. 1 INTRODUCTION A number of learning algorithms have been developed to make synthetic neural machines be trainable to function in certain optimal ways. If the brain and nervous systems that we see in nature are best answers of the evolutionary process, then one might be able to find some common'softwares' in real and artificial neural systems. This possibility is examined in this paper, with respect to a basic visual 160 S. Y ASUI, T. FURUKAWA, M. YAMADA, T. SAITO