Goto

Collaborating Authors

 Technology


Neural Networks with Quadratic VC Dimension

Neural Information Processing Systems

A set of labeled training samples is provided, and a network must be obtained which is then expected to correctly classify previously unseen inputs. In this context, a central problem is to estimate the amount of training data needed to guarantee satisfactory learning performance. To study this question, it is necessary to first formalize the notion of learning from examples. One such formalization is based on the paradigm of probably approximately correct (PAC) learning, due to Valiant (1984). In this framework, one starts by fitting some function /, chosen from a predetermined class F, to the given training data. The class F is often called the "hypothesis class", and for purposes of this discussion it will be assumed that the functions in F take binary values {O, I} and are defined on a common domain X.


Learning with ensembles: How overfitting can be useful

Neural Information Processing Systems

We study the characteristics of learning with ensembles. Solving exactly the simple model of an ensemble of linear students, we find surprisingly rich behaviour. For learning in large ensembles, it is advantageous to use under-regularized students, which actually over-fit the training data. Globally optimal performance can be obtained by choosing the training set sizes of the students appropriately. For smaller ensembles, optimization of the ensemble weights can yield significant improvements in ensemble generalization performance, in particular if the individual students are subject to noise in the training process. Choosing students with a wide range of regularization parameters makes this improvement robust against changes in the unknown level of noise in the training data. 1 INTRODUCTION An ensemble is a collection of a (finite) number of neural networks or other types of predictors that are trained for the same task.


A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

Neural Information Processing Systems

We work in a setting in which we must choose the right number of parameters for a hypothesis function in response to a finite training sample, with the goal of minimizing the resulting generalization error. There is a large and interesting literature on cross validation methods, which often emphasizes asymptotic statistical properties, or the exact calculation of the generalization error for simple models. Our approach here is somewhat different, and is pri mari I y inspired by two sources. The first is the work of Barron and Cover [2], who introduced the idea of bounding the error of a model selection method (in their case, the Minimum Description Length Principle) in terms of a quantity known as the index of resolvability. The second is the work of Vapnik [5], who provided extremely powerful and general tools for uniformly bounding the deviations between training and generalization errors. We combine these methods to give a new and general analysis of cross validation performance. In the first and more formal part of the paper, we give a rigorous bound on the error of cross validation in terms of two parameters of the underlying model selection problem: the approximation rate and the estimation rate. In the second and more experimental part of the paper, we investigate the implications of our bound for choosing'Y, the fraction of data withheld for testing in cross validation. The most interesting aspect of this analysis is the identification of several qualitative properties of the optimal'Y that appear to be invariant over a wide class of model selection problems: - When the target function complexity is small compared to the sample size, the performance of cross validation is relatively insensitive to the choice of'Y.


Statistical Theory of Overtraining - Is Cross-Validation Asymptotically Effective?

Neural Information Processing Systems

A statistical theory for overtraining is proposed. The analysis treats realizable stochastic neural networks, trained with Kullback Leibler loss in the asymptotic case. It is shown that the asymptotic gain in the generalization error is small if we perform early stopping, even if we have access to the optimal stopping time. Considering cross-validation stopping we answer the question: In what ratio the examples should be divided into training and testing sets in order to obtain the optimum performance. In the non-asymptotic region cross-validated early stopping always decreases the generalization error. Our large scale simulations done on a CM5 are in nice agreement with our analytical findings.


Learning Model Bias

Neural Information Processing Systems

In this paper the problem of learning appropriate domain-specific bias is addressed. It is shown that this can be achieved by learning many related tasks from the same domain, and a theorem is given bounding the number tasks that must be learnt.


Plasticity of Center-Surround Opponent Receptive Fields in Real and Artificial Neural Systems of Vision

Neural Information Processing Systems

The center-surround opponent receptive field (CSRF) mechanism represents one such example. Here, analogous CSRFs are shown to be formed in an artificial neural network which learns to localize contours (edges) of the luminance difference. Furthermore, when the input pattern is corrupted by a background noise, the CSRFs of the hidden units becomes shallower and broader with decrease of the signal-to-noise ratio (SNR). The same kind of SNR-dependent plasticity is present in the CSRF of real visual neurons; in bipolar cells of the carp retina as is shown here experimentally, as well as in large monopolar cells of the fly compound eye as was described by others. Also, analogous SNRdependent plasticity is shown to be present in the biphasic flash responses (BPFR) of these artificial and biological visual systems. Thus, the spatial (CSRF) and temporal (BPFR) filtering properties with which a wide variety of creatures see the world appear to be optimized for detectability of changes in space and time. 1 INTRODUCTION A number of learning algorithms have been developed to make synthetic neural machines be trainable to function in certain optimal ways. If the brain and nervous systems that we see in nature are best answers of the evolutionary process, then one might be able to find some common'softwares' in real and artificial neural systems. This possibility is examined in this paper, with respect to a basic visual 160 S. Y ASUI, T. FURUKAWA, M. YAMADA, T. SAITO


Simulation of a Thalamocortical Circuit for Computing Directional Heading in the Rat

Neural Information Processing Systems

Several regions of the rat brain contain neurons known as head-direction celis, which encode the animal's directional heading during spatial navigation. This paper presents a biophysical model of head-direction cell acti vity, which suggests that a thalamocortical circuit might compute the rat's head direction by integrating the angular velocity of the head over time. The model was implemented using the neural simulator NEURON, and makes testable predictions about the structure and function of the rat head-direction circuit.


Independent Component Analysis of Electroencephalographic Data

Neural Information Processing Systems

Because of the distance between the skull and brain and their different resistivities, electroencephalographic (EEG) data collected from any point on the human scalp includes activity generated within a large brain area. This spatial smearing of EEG data by volume conduction does not involve significant time delays, however, suggesting that the Independent Component Analysis (ICA) algorithm of Bell and Sejnowski [1] is suitable for performing blind source separation on EEG data.


Cholinergic suppression of transmission may allow combined associative memory function and self-organization in the neocortex

Neural Information Processing Systems

Selective suppression of transmission at feedback synapses during learning is proposed as a mechanism for combining associative feedback with self-organization of feed forward synapses. Experimental data demonstrates cholinergic suppression of synaptic transmission in layer I (feedback synapses), and a lack of suppression in layer IV (feedforward synapses). A network with this feature uses local rules to learn mappings which are not linearly separable. During learning, sensory stimuli and desired response are simultaneously presented as input. Feedforward connections form self-organized representations of input, while suppressed feedback connections learn the transpose of feedforward connectivity. During recall, suppression is removed, sensory input activates the self-organized representation, and activity generates the learned response.


Temporal coding in the sub-millisecond range: Model of barn owl auditory pathway

Neural Information Processing Systems

Binaural coincidence detection is essential for the localization of external sounds and requires auditory signal processing with high temporal precision. We present an integrate-and-fire model of spike processing in the auditory pathway of the barn owl. It is shown that a temporal precision in the microsecond range can be achieved with neuronal time constants which are at least one magnitude longer. An important feature of our model is an unsupervised Hebbian learning rule which leads to a temporal fine tuning of the neuronal connections.